Автор: Tom White
Название: Hadoop: The Definitive Guide
Organizations large and small are adopting Apache Hadoop to deal with huge application datasets. Hadoop: The Definitive Guide provides you with the key for unlocking the wealth this data holds. Hadoop is ideal for storing and processing massive amounts of data, but until now, information on this open source project has been lacking -- especially with regard to best practices. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems. Programmers will find details for analyzing large datasets with Hadoop, and administrators will learn how to set up and run Hadoop clusters. This book helps you: Use the Hadoop Distributed File System (HDFS) for storing large datasets, and running distributed computations over those datasets, using MapReduce Become familiar with Hadoop's data and IO building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Use Pig, a high-level query language for large-scale data processing Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use HBase, Hadoop's database for structured and semi-structured data
This book includes case studies that illustrate how Hadoop is used to solve specific problems. If you're considering Hadoop, or already use it, Hadoop: The Definitive Guide is the most thorough book available on the subject.