This e-book provides an introduction to the knowledge and skills required to build and deploy a variety of text information systems for managing and analyzing vast amounts of text data effectively and efficiently.
The dramatic growth of natural language text data (web pages, news articles, scientific literature, social media, email, etc.) has led to an increasing demand for powerful software tools that can help in discovering knowledge about human opinions and preferences, in addition to many other kinds of knowledge that we encode in text. There are several approaches toward reaching this goal, and this book systematically lays a foundation of the major concepts, techniques, and ideas in information retrieval and text data mining from a practical viewpoint. It is both textbook for university computer science students and reference for practicing professionals.
Table of Contents
Part I: Overview and Background1. Introduction
2. Background
3. Text Data Understanding
4. MeTA: A Unified Toolkit for Text Data Management and Analysis
Part II: Text Data Access5. Overview of Text Data Access
6. Retrieval Models
7. Feedback
8. Search Engine Implementation
9. Search Engine Evaluation
10. Web Search
11. Recommender Systems
Part III: Text Data Analysis12. Overview of Text Data Analysis
13. Word Association Mining
14. Text Clustering
15. Text Categorization
16. Text Summarization
17. Topic Analysis
18. Opinion Mining and Sentiment Analysis
19. Joint Analysis of Text and Structured Data
Part IV: Unified Text Data Management Analysis System20. Toward a Unified System for Text Management and Analysis
About the Author(s)
ChengXiang Zhai, University of Illinois at Urbana-Champaign
ChengXiang Zhai is a Professor of Computer Science and Willett Faculty Scholar at the University of Illinois at Urbana-Champaign, where he is also affiliated with the Graduate School of Library and Information Science, Institute for Genomic Biology, and Department of Statistics. He received a Ph.D. in Computer Science from Nanjing University in 1990, and a Ph.D. in Language and Information Technologies from Carnegie Mellon University in 2002. He worked at Clairvoyance Corp. as a Research Scientist and then Senior Research Scientist from 1997 to 2000. His research interests include information retrieval, text mining, natural language processing, machine learning, biomedical and health informatics, and intelligent education information systems. He has published over 200 research papers in major conferences and journals. He is an Associate Editor for Information Processing and Management and previously served as an Associate Editor of ACM Transactions on Information Systems, and on the editorial board of Information Retrieval Journal. He is a conference program co-chair of ACM CIKM 2004, NAACL HLT 2007, ACM SIGIR 2009, ECIR 2014, ICTIR 2015, and WWW 2015, and conference general co-chair for ACM CIKM 2016. He is an ACM Distinguished Scientist and a recipient of multiple awards, including the ACM SIGIR 2004 Best Paper Award, the ACM SIGIR 2014 Test of Time Paper Award, Alfred P. Sloan Research Fellowship, IBM Faculty Award, HP Innovation Research Program Award, Microsoft Beyond Search Research Award, and the Presidential Early Career Award for Scientists and Engineers (PECASE).
Sean Massung, University of Illinois at Urbana-Champaign
Sean Massung is a Ph.D. candidate in computer science at the University of Illinois at Urbana-
Champaign, where he also received both his B.S. and M.S. degrees. He is a co-founder of
META and uses it in all of his research. He has been instructor for CS 225: Data Structures and Programming Principles, CS 410: Text Information Systems, and CS 591txt: Text Mining Seminar. He is included in the 2014 List of Teachers Ranked as Excellent at the University of Illinois and has received an Outstanding Teaching Assistant Award and CS@Illinois Outstanding Research Project Award. He has given talks at Jump Labs Champaign and at UIUC for Data and Information Systems Seminar, Intro to Big Data, and Teaching Assistant Seminar. His research interests include text mining applications in information retrieval, natural language processing, and education.