eBooks-it.org Logo
eBooks-IT.org Inner Image

Optimizing Hadoop for MapReduce

Optimizing Hadoop for MapReduce Image

Book Details:

Publisher:Packt Publishing
Series: Packt
Author:Khaled Tannir
Edition:1
ISBN-10:1783285656
ISBN-13:9781783285655
Pages:120
Published:Feb 21 2014
Posted:Nov 19 2014
Language:English
Book format:PDF
Book size:2 MB

Book Description:

Learn how to configure your Hadoop cluster to run optimal MapReduce jobs Overview Optimize your MapReduce job performance Identify your Hadoop cluster's weaknesses Tune your MapReduce configuration In Detail MapReduce is the distribution system that the Hadoop MapReduce engine uses to distribute work around a cluster by working parallel on smaller data sets. It is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning, and statistical machine translation. This book introduces you to advanced MapReduce concepts and teaches you everything from identifying the factors that affect MapReduce job performance to tuning the MapReduce configuration. Based on real-world experience, this book will help you to fully utilize your cluster's node resources to run MapReduce jobs optimally. This book details the Hadoop MapReduce job performance optimization process. Through a number of clear and practical steps, it will help you to fully utilize your cluster's node resources. Starting with how MapReduce works and the factors that affect MapReduce performance, you will be given an overview of Hadoop metrics and several performance monitoring tools. Further on, you will explore performance counters that help you identify resource bottlenecks, check cluster health, and size your Hadoop cluster. You will also learn about optimizing map and reduce tasks by using Combiners and compression. The book ends with best practices and recommendations on how to use your Hadoop cluster optimally. What you will learn from this book Learn about the factors that affect MapReduce performance Utilize the Hadoop MapReduce performance counters to identify resource bottlenecks Size your Hadoop cluster's nodes Set the number of mappers and reducers correctly Optimize mapper and reducer task throughput and code size using compression and Combiners Understand the various tuning properties and best practices to optimize clusters Approach This book is an example-based tutorial that deals with optimizing MapReduce job performance. Who this book is written for If you are a Hadoop administrator, developer, MapReduce user, or beginner, this book is the best choice available if you wish to optimize your clusters and applications. Having prior knowledge of creating MapReduce applications is not necessary, but will help you better understand the concepts and snippets of MapReduce class template code.

Download Link:

Related Books:

Hadoop For Dummies

Hadoop For Dummies Image
Let Hadoop For Dummies help harness the power of your data and rein in the information overloadBig data has become big business, and companies and organizations of all sizes are struggling to find ways to retrieve valuable information from their massive data sets with becoming overwhelmed. Enter Hadoop and this easy-to-understand ForDummies guide.Hadoop For Dummies helps readers understand the value of big data, make a business case for using Hadoop, navigate the Hadoop ecosystem, and build and manage Hadoop applications and clusters.Explains the origins of Hadoop, its economic benefits, and its functionality and practical applicationsHelps you find your way around the Hadoop ecosystem, program MapReduce, utilize design patterns, and get your Hadoop ...

Hadoop for Finance Essentials

Hadoop for Finance Essentials Image
This book is perfect for developers, analysts, architects or managers who would like to perform big data analytics with Hadoop for the financial sector. This book is also helpful for technology professionals from other industry sectors who have recently switched or like to switch their business domain to financial sector. Familiarity with big data, Java programming, database and data warehouse, and business intelligence would be beneficial....

Apache Hadoop YARN

Moving beyond MapReduce and Batch Processing with Apache Hadoop 2
Apache Hadoop YARN Image
'This book is a critically needed resource for the newly released Apache Hadoop 2.0, highlighting YARN as the significant breakthrough that broadens Hadoop beyond the MapReduce paradigm.' -From the Foreword by Raymie Stata, CEO of Altiscale The Insider's Guide to Building Distributed, Big Data Applications with Apache Hadoop YARN Apache Hadoop is helping drive the Big Data revolution. Now, its data processing has been completely overhauled: Apache Hadoop YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of data. And now in Apache Hadoop YARN, two Hadoop technical leaders show you how to develop new applications and adapt existing code to fully leverage these revol...



2007 - 2021 © eBooks-IT.org