Data-Intensive Text Processing with MapReduce

Data-Intensive Text Processing with MapReduce

Jimmy Lin, Chris Dyer
ISBN: 9781608453429 | PDF ISBN: 9781608453436
Copyright © 2010 | 177 Pages | Publication Date: 01/01/2010

BEFORE YOU ORDER: You may have Academic or Corporate access to this title. Click here to find out: 10.2200/S00274ED1V01Y201006HLT007

Ordering Options: Paperback $40.00   E-book $32.00   Paperback & E-book Combo $50.00

Why pay full price? Members receive 15% off all orders.
Learn More Here

Read Our Digital Content License Agreement (pop-up)

Purchasing Options:

Our world is being revolutionized by data-driven methods: access to large amounts of data has generated new insights and opened exciting new opportunities in commerce, science, and computing applications. Processing the enormous quantities of data necessary for these advances requires large clusters, making distributed computing paradigms more crucial than ever. MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. The programming model provides an easy-to-understand abstraction for designing scalable algorithms, while the execution framework transparently handles many system-level details, ranging from scheduling to synchronization to fault tolerance. This book focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning. We introduce the notion of MapReduce design patterns, which represent general reusable solutions to commonly occurring problems across a variety of problem domains. This book not only intends to help the reader "think in MapReduce", but also discusses limitations of the programming model as well.

Table of Contents

MapReduce Basics
MapReduce Algorithm Design
Inverted Indexing for Text Retrieval
Graph Algorithms
EM Algorithms for Text Processing
Closing Remarks

About the Author(s)

Jimmy Lin, University of Maryland
Jimmy Lin is an Associate Professor in the iSchool (College of Information Studies) at the University of Maryland, College Park. He directs the recently-formed Cloud Computing Center, an interdisciplinary group that explores the many aspects of cloud computing as it impacts technology, people, and society. Lin's research lies at the intersection of natural language processing and information retrieval, with a recent emphasis on scalable algorithms and large-data processing. He received his Ph.D. from MIT in Electrical Engineering and Computer Science in 2004.

Chris Dyer, University of Maryland
Chris Dyer is graduating with a Ph.D. in Linguistics from the University of Maryland, College Park in June, 2010 and will be joining the Language Technologies Institute at Carnegie Mellon University as a postdoctoral researcher. His research interests include statistical machine translation and machine learning, and he has served as a reviewer for numerous conferences and journals in the areas of natural language processing and computational linguistics. He first became acquainted with MapReduce in 2007 using Hadoop, version 0.13.0, and gained further experience with MapReduce during an internship with Google Research in 2008.


Browse by Subject
Case Studies in Engineering
ACM Books
SEM Books
0 items

Note: Registered customers go to: Your Account to subscribe.

E-Mail Address:

Your Name: