Probabilistic Databases

Probabilistic Databases

Dan Suciu, Dan Olteanu, Christopher Re, Christoph Koch
ISBN: 9781608456802 | PDF ISBN: 9781608456819
Copyright © 2011 | 180 Pages | Publication Date: 01/01/2011

BEFORE YOU ORDER: You may have Academic or Corporate access to this title. Click here to find out: 10.2200/S00362ED1V01Y201105DTM016

Ordering Options: Paperback $45.00   E-book $36.00   Paperback & E-book Combo $56.25

Why pay full price? Members receive 15% off all orders.
Learn More Here

Read Our Digital Content License Agreement (pop-up)

Purchasing Options:

Probabilistic databases are databases where the value of some attributes or the presence of some records are uncertain and known only with some probability. Applications in many areas such as information extraction, RFID and scientific data management, data cleaning, data integration, and financial risk assessment produce large volumes of uncertain data, which are best modeled and processed by a probabilistic database.

This book presents the state of the art in representation formalisms and query processing techniques for probabilistic data. It starts by discussing the basic principles for representing large probabilistic databases, by decomposing them into tuple-independent tables, block-independent-disjoint tables, or U-databases. Then it discusses two classes of techniques for query evaluation on probabilistic databases. In extensional query evaluation, the entire probabilistic inference can be pushed into the database engine and, therefore, processed as effectively as the evaluation of standard SQL queries. The relational queries that can be evaluated this way are called safe queries. In intensional query evaluation, the probabilistic inference is performed over a propositional formula called lineage expression: every relational query can be evaluated this way, but the data complexity dramatically depends on the query being evaluated, and can be #P-hard. The book also discusses some advanced topics in probabilistic data management such as top-k query processing, sequential probabilistic databases, indexing and materialized views, and Monte Carlo databases.

Table of Contents: Overview / Data and Query Model / The Query Evaluation Problem / Extensional Query Evaluation / Intensional Query Evaluation / Advanced Techniques

Table of Contents

Data and Query Model
The Query Evaluation Problem
Extensional Query Evaluation
Intensional Query Evaluation
Advanced Techniques

About the Author(s)

Dan Suciu, University of Washington
Dan Suciu is a Professor in Computer Science at the University of Washington. He received his Ph.D. from the University of Pennsylvania in 1995, then was a principal member of the technical staff at AT&T Labs until he joined the University of Washington in 2000. Professor Suciu is conducting research in data management, with an emphasis on topics that arise from sharing data on the Internet, such as management of semistructured and heterogeneous data, data security, and managing data with uncertainties. He is a co-author of the book Data on the Web: from Relations to Semistructured Data and XML. He holds twelve US patents, received the 2000 ACM SIGMOD Best Paper Award, the 2010 PODS Ten Years Best paper award, and is a recipient of the NSF Career Award and of an Alfred P. Sloan Fellowship. Suciu's PhD students Gerome Miklau and Christopher Re received the ACM SIGMOD Best Dissertation Award in 2006 and 2010, respectively, and Nilesh Dalvi was a runner up in 2008.

Dan Olteanu, University of Oxford
Dan Olteanu is a University Lecturer (equivalent of Assistant Professor in North America) in the Department of Computer Science at the University of Oxford and Fellow of St. Cross College since September 2007. He received his Dr. rer. nat. in Computer Science from Ludwig Maximilian University of Munich in 2005. Before joining Oxford, he was post-doctoral researcher with Professor Christoph Koch at Saarland University, visiting scientist at Cornell University, and temporary professor at Ruprecht Karl University in Heidelberg. His main research is on theoretical and system aspects of data management, with a current focus on Web data, provenance information, and probabilistic databases.

Christopher Re, University of Wisconsin-Madison
Christopher (Chris) Re is currently an Assistant Professor in the department of Computer Sciences at the University of Wisconsin-Madison. The goal of his work is to enable users and developers to build applications that more deeply understand data. In many applications, machines can only understand the meaning of data statistically, e.g., user-generated text or data from sensors. To attack this challenge, Chris's recent work is to build a system, Hazy, that integrates a handful of statistical operators with a standard relational database management system. To support this work, Chris received the NSF CAREER Award in 2011. Chris received his PhD from the University of Washington, Seattle under the supervision of Dan Suciu. For his PhD work in the area of probabilistic data management, Chris received the SIGMOD 2010 Jim Gray Dissertation Award. His PhD work produced two systems: Mystiq, a system to manage relational probabilistic data, and Lahar, a streaming probabilistic database.

Christoph Koch, Ecole Polytechnique Federale de Lausanne
Christoph Koch is a Professor of Computer Science at Ecole Polytechnique Federale de Lausanne (EPFL) in Lausanne, Switzerland. He is interested in both the theoretical and systems-oriented aspects of data management, and he currently works on managing uncertain and probabilistic data, research at the intersection of databases, programming languages, and compilers, community data management systems, and data-driven games. He received his PhD from TU Vienna, Austria, in 2001, for research done at CERN, Switzerland and subsequently held positions at TU Vienna (2001-2002; 2003-2005), the University of Edinburgh (2002-2003), Saarland University (2005-2007), and Cornell University (2006; 2007-2010), before joining EPFL in 2010. He won best paper awards at PODS 2002, and SIGMOD 2011, a Google Research Award (2009), and has been PC co-chair of DBPL 2005, WebDB 2008, and ICDE 2011.

Related Series

Data Mining and Knowledge Discovery

Browse by Subject
Case Studies in Engineering
ACM Books
IOP Concise Physics
0 items

Note: Registered customers go to: Your Account to subscribe.

E-Mail Address:

Your Name: