eBooks-it.org Logo
eBooks-IT.org Inner Image

Mastering Apache Spark


eBooks-IT.org User Icon Image Uploaded by Enigma69
Mastering Apache Spark Image

Book Details:

Publisher:Packt Publishing
Series: Packt , Mastering
Author:Mike Frampton
Edition:1
ISBN-10:1783987146
ISBN-13:9781783987146
Pages:318
Published:Sep 30 2015
Posted:Feb 27 2016
Language:English
Book format:PDF
Book size:6.97 MB

Book Description:

Gain expertise in processing and storing data by using advanced techniques with Apache Spark About This Book * Explore the integration of Apache Spark with third party applications such as H20, Databricks and Titan * Evaluate how Cassandra and Hbase can be used for storage * An advanced guide with a combination of instructions and practical examples to extend the most up-to date Spark functionalities Who This Book Is For If you are a developer with some experience with Spark and want to strengthen your knowledge of how to get around in the world of Spark, then this book is ideal for you. Basic knowledge of Linux, Hadoop and Spark is assumed. Reasonable knowledge of Scala is expected. What You Will Learn * Extend the tools available for processing and storage * Examine clustering and classification using MLlib * Discover Spark stream processing via Flume, HDFS * Create a schema in Spark SQL, and learn how a Spark schema can be populated with data * Study Spark based graph processing using Spark GraphX * Combine Spark with H20 and deep learning and learn why it is useful * Evaluate how graph storage works with Apache Spark, Titan, HBase and Cassandra * Use Apache Spark in the cloud with Databricks and AWS In Detail Apache Spark is an in-memory cluster based parallel processing system that provides a wide range of functionality like graph processing, machine learning, stream processing and SQL. It operates at unprecedented speeds, is easy to use and offers a rich set of data transformations. This book aims to take your limited knowledge of Spark to the next level by teaching you how to expand Spark functionality. The book commences with an overview of the Spark eco-system. You will learn how to use MLlib to create a fully working neural net for handwriting recognition. You will then discover how stream processing can be tuned for optimal performance and to ensure parallel processing. The book extends to show how to incorporate H20 for machine learning, Titan for graph based storage, Databricks for cloud-based Spark. Intermediate Scala based code examples are provided for Apache Spark module processing in a CentOS Linux and Databricks cloud environment. Style and approach This book is an extensive guide to Apache Spark modules and tools and shows how Spark's functionality can be extended for real-time processing and storage with worked examples.

Download Link:

Related Books:

Mastering Apache Cassandra

Mastering Apache Cassandra Image
Get comfortable with the fastest NoSQL database, its architecture, key programming patterns, infrastructure management, and more! Overview Complete coverage of all aspects of Cassandra Discusses prominent patterns, pros and cons, and use cases Contains briefs on integration with other software In Detail Apache Cassandra is the perfect choice for building fault tolerant and scalable databases. Implementing Cassandra will enable you to take advantage of its features which include replication of data across multiple datacenters with lower latency rates. This book details these features that will guide you towards mastering the art of building high performing databases without compromising on performance. Mastering Apache Cassandra aims to give enough ...

Mastering Apache Velocity

Mastering Apache Velocity Image
A comprehensive tutorial on how to use the power of Velocity 1.3 to build Web sites and generate contentDesigned to work hand-in-hand with Apache Turbine, Struts, and servlets, Velocity is a powerful template language that greatly enhances the developer's ability to customize Web sites. It separates Java code from the Web pages, making a site more maintainable. Because of this, it is a viable alternative to JSPs and PHP and is expected to become the standard template engine. In addition to its use with Struts and Turbine, Velocity can also be used to generate Java and XML source code, XML schemas, HTML templates, and SQL code.Even with all its promise, finding expert instructions on how to properly program with this language has been difficult. This ...

Mastering Apache Camel

Mastering Apache Camel Image
This book is intended for all Camel users who want to get the best out of Camel, and who want to implement the most efficient integration logic using best practices....



2007 - 2021 © eBooks-IT.org