Steve Hoffman eBooks

Download free Steve Hoffman eBooks

Apache Flume

Distributed Log Collection for Hadoop

2nd Edition

Design and implement a series of Flume agents to send streamed data into Hadoop About This BookConstruct a series of Flume agents using the Apache Flume service to efficiently collect, aggregate, and move large amounts of event dataConfigure failover paths and load balancing to remove single points of failureUse this step-by-step guide to stream logs from application servers to Hadoop's HDFSWho This Book Is ForIf you are a Hadoop programmer who wants to learn about Flume to be able to move datasets into Hadoop in a timely and replicable manner, then this book is ideal for you. No prior knowledge about Apache Flume is necessary, but a basic knowledge of...

Apache Flume

Distributed Log Collection for Hadoop

If your role includes moving datasets into Hadoop, this book will help you do it more efficiently using Apache Flume. From installation to customization, it's a complete step-by-step guide on making the service work for you. Overview Integrate Flume with your data sources Transcode your data en-route in Flume Route and separate your data using regular expression matching Configure failover paths and load-balancing to remove single points of failure Utilize Gzip Compression for files written to HDFS In Detail Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoop's HDFS. It has a simple and flexib...