The "big data" era is characterized by an explosion of information in the form of digital data collections, ranging from scientific knowledge, to social media, news, and everyone's daily life. Examples of such collections include scientific publications, enterprise logs, news articles, social media, and general web pages. Valuable knowledge about multi-typed entities is often hidden in the unstructured or loosely structured, interconnected data. Mining latent structures around entities uncovers hidden knowledge such as implicit topics, phrases, entity roles and relationships. In this monograph, we investigate the principles and methodologies of mining latent entity structures from massive unstructured and interconnected data. We propose a text-rich information network model for modeling data in many different domains. This leads to a series of new principles and powerful methodologies for mining latent structures, including (1) latent topical hierarchy, (2) quality topical phrases, (3) entity roles in hierarchical topical communities, and (4) entity relations. This book also introduces applications enabled by the mined structures and points out some promising research directions.
Table of Contents
Hierarchical Topic and Community Discovery
Topical Phrase Mining
Entity Topical Role Analysis
Mining Entity Relations
Scalable and Robust Topic Discovery
Application and Research Frontier
About the Author(s)Chi Wang
, Microsoft Research
Chi Wang is a researcher at Microsoft Research, Redmond, Washington. He received his Ph.D. degree in computer science from the University of Illinois at Urbana-Champaign in 2014. He graduated from Tsinghua University, China, in 2009. His research has been focused on data mining, information network analysis, and text mining. He is the first winner of the prestigious Microsoft Research Graduate Research Fellowship in the history of Computer Science, University of Illinois at Urbana-Champaign.Jiawei Han
, University of Illinois at Urbana-Champaign
Jiawei Han is the Abel Bliss Professor in the Department of Computer Science at the University of Illinois. His research interests include data mining, information network analysis, and database systems, and he has over 600 publications. He served as the founding Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data (TKDD). Jiawei has received the ACM SIGKDD Innovation Award (2004), IEEE Computer Society Technical Achievement Award (2005), IEEE Computer Society W. Wallace McDowell Award (2009), and Daniel C. Drucker Eminent Faculty Award at UIUC (2011). He is a Fellow of ACM and a Fellow of IEEE. He is currently the Director of Information Network Academic Research Center (INARC) supported by the Network Science-Collaborative Technology Alliance (NS-CTA) program of U.S. Army Research Lab. His co-authored textbook Data Mining: Concepts and Techniques
(Morgan Kaufmann) has been adopted worldwide.