Learn how to configure your Hadoop cluster to run optimal MapReduce jobs Overview Optimize your MapReduce job performance Identify your Hadoop cluster's weaknesses Tune your MapReduce configuration In Detail MapReduce is the distribution system that the Hadoop MapReduce engine uses to distribute work around a cluster by working parallel on smaller data sets. It is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning, and statistical machine translation. This book introduces you to advanced MapReduce concepts and teaches you everything from identifying the facto...