Compression Schemes for Mining Large Datasets

A Machine Learning Perspective

Book Details:

Publisher:	Springer
Series:	Springer
Author:	M. Narasimha Murty
Edition:	1
ISBN-10:	1447156064
ISBN-13:	9781447156062

Pages:	197
Published:	Nov 14 2013
Posted:	Mar 24 2015
Language:	English
Book format:	PDF
Book size:	2.4 MB

Book Description:

This book addresses the challenges of data abstraction generation using a least number of database scans, compressing data through novel lossy and non-lossy schemes, and carrying out clustering and classification directly in the compressed domain. Schemes are presented which are shown to be efficient both in terms of space and time, while simultaneously providing the same or better classification accuracy. Features:describes a non-lossy compression scheme based on run-length encoding of patterns with binary valued features; proposes a lossy compression scheme that recognizes a pattern as a sequence of features and identifying subsequences; examines whether the identification of prototypes and features can be achieved simultaneously through lossy compression and efficient clustering; discusses ways to make use of domain knowledge in generating abstraction; reviews optimal prototype selection using genetic algorithms; suggests possible ways of dealing with big data problems using multiagent systems.

Download Link:

Related Books:

Advanced Neural Network-Based Computational Schemes for Robust Fault Diagnosis

The present book is devoted to problems of adaptation of artificial neural networks to robust fault diagnosis schemes. It presents neural networks-based modelling and estimation techniques used for designing robust fault diagnosis schemes for non-linear dynamic systems.A part of the book focuses on fundamental issues such as architectures of dynamic neural networks, methods for designing of neural networks and fault diagnosis schemes as well as the importance of robustness. The book is of a tutorial value and can be perceived as a good starting point for the new-comers to this field. The book is also devoted to advanced schemes of description of neural model uncertainty. In particular, the methods of computation of neural networks uncertainty with ro...

21 Recipes for Mining Twitter

Millions of public Twitter streams harbor a wealth of data, and once you mine them, you can gain some valuable insights. This short and concise book offers a collection of recipes to help you extract nuggets of Twitter information using easy-to-learn Python tools. Each recipe offers a discussion of how and why the solution works, so you can quickly adapt it to fit your particular needs. The recipes include techniques to: Use Oauth to access Twitter data Create and analyze graphs of retweet relationships Use the streaming Api to harvest tweets in realtime Harvest and analyze friends and followers Discover friendship cliques Summarize webpages from short Urls This book is a perfect companion to O'Reilly's Mining the Social Web....

Text Analysis Pipelines

Towards Ad-hoc Large-Scale Text Mining

This monograph proposes a comprehensive and fully automatic approach to designing text analysis pipelines for arbitrary information needs that are optimal in terms of run-time efficiency and that robustly mine relevant information from text of any kind. Based on state-of-the-art techniques from machine learning and other areas of artificial intelligence, novel pipeline construction and execution algorithms are developed and implemented in prototypical software. Formal analyses of the algorithms and extensive empirical experiments underline that the proposed approach represents an essential step towards the ad-hoc use of text mining in web search and big data analytics.Both web search and big data analytics aim to fulfill peoples needs for information...