eBooks-it.org Logo
eBooks-IT.org Inner Image

Taming Text

How to Find, Organize, and Manipulate It

Taming Text Image

Book Details:

Publisher:Manning Publications
Series: Manning , How To
Author:Grant S. Ingersoll
Published:Jan 21 2013
Posted:Nov 19 2014
Book format:PDF
Book size:4.64 MB

Book Description:

SummaryTaming Text is a hands-on, example-driven guide to working with unstructured text in the context of real-world applications. This book explores how to automatically organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization. The book guides you through examples illustrating each of these topics, as well as the foundations upon which they are built.About this Book There is so much text in our lives, we are practically drowning in it. Fortunately, there are innovative tools and techniques for managing unstructured information that can throw the smart developer a much-needed lifeline. You'll find them in this book.Taming Text is a practical, example-driven guide to working with text in real applications. This book introduces you to useful techniques like full-text search, proper name recognition, clustering, tagging, information extraction, and summarization. You'll explore real use cases as you systematically absorb the foundations upon which they are built. Written in a clear and concise style, this book avoids jargon, explaining the subject in terms you can understand without a background in statistics or natural language processing. Examples are in Java, but the concepts can be applied in any language.Written for Java developers, the book requires no prior knowledge of GWT. Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book. What's InsideWhen to use text-taming techniques Important open-source libraries like Solr and Mahout How to build text-processing applicationsAbout the AuthorsGrant Ingersoll is an engineer, speaker, and trainer, a Lucene committer, and a cofounder of the Mahout machine-learning project. Thomas Morton is the primary developer of OpenNLP and Maximum Entropy. Drew Farris is a technology consultant, software developer, and contributor to Mahout, Lucene, and Solr."Takes the mystery out of very complex processes."From the Foreword by Liz Liddy, Dean, iSchool, Syracuse UniversityTable of ContentsGetting started taming text Foundations of taming text Searching Fuzzy string matching Identifying people, places, and things Clustering text Classification, categorization, and tagging Building an example question answering system Untamed text: exploring the next frontier

Download Link:

Related Books:

Word Hacks

Tips & Tools for Taming Your Text
Word Hacks Image
As one of the applications in Microsoft Office, Word is the dominant word-processing program for both Windows and Mac users. Millions of people around the globe use it. But many, if not most, of them barely skim the surface of what is possible with Microsoft Word. Seduced by the application's supposed simplicity, they settle for just what's obvious--even if it doesn't satisfy their wants and needs. They may curse the wretched Bullets and Numbering buttons multiple times a day or take hours to change the font size of every heading in a lengthy report, yet they're reluctant to dig deeper to take advantage of Word's immense capabilities and limitless customization tools.Let Word Hacks be your shovel. Let it carve your way into Word and make this most po...

Text Analysis Pipelines

Towards Ad-hoc Large-Scale Text Mining
Text Analysis Pipelines Image
This monograph proposes a comprehensive and fully automatic approach to designing text analysis pipelines for arbitrary information needs that are optimal in terms of run-time efficiency and that robustly mine relevant information from text of any kind. Based on state-of-the-art techniques from machine learning and other areas of artificial intelligence, novel pipeline construction and execution algorithms are developed and implemented in prototypical software. Formal analyses of the algorithms and extensive empirical experiments underline that the proposed approach represents an essential step towards the ad-hoc use of text mining in web search and big data analytics.Both web search and big data analytics aim to fulfill peoples needs for information...

Java Regular Expressions

Taming the java.util.regex Engine
Java Regular Expressions Image
Java has always been an excellent language for working with objects. But Java's text manipulation mechanisms have always been limited, compared to languages like AWK and Perl. On the flip side, a regular expressions package in Java 2 Standard Edition (J2SE) brings hope to the Java text mechanisms. This package provides you everything necessary to use regular expressionsall packaged in a simplified object-oriented framework. In addition to working examples and best practices, this book features a detailed API reference with examples supporting nearly every method, and a step-by-step tutorial to create your own regular expressions. With time, you'll discover that regular expressions are extremely powerful in your programming arsenaland you'll enjoy us...

2007 - 2021 © eBooks-IT.org