Automatic Text Simplification

Horacio Saggion
Thanks to the availability of texts on the Web in recent years, increased knowledge and information have been made available to broader audiences. However, the way in which a text is written (its vocabulary, its syntax) can be difficult to read and understand for many people, especially those with poor literacy, cognitive or linguistic impairment, or those with limited knowledge of the language of the text. Texts containing uncommon words or long and complicated sentences can be difficult to read and understand by people as well as difficult to analyze by machines. Automatic text simplification is the process of transforming a text into another text which, ideally conveying the same message, will be easier to read and understand by a broader audience. The process usually involves the replacement of difficult or unknown phrases with simpler equivalents and the transformation of long and syntactically complex sentences into shorter and less complex ones. Automatic text simplification, a research topic which started 20 years ago, now has taken on a central role in natural language processing research not only because of the interesting challenges it possesses but also because of its social implications. This book presents past and current research in text simplification, exploring key issues including automatic readability assessment, lexical simplification, and syntactic simplification. It also provides a detailed account of machine learning techniques currently used in simplification, describes full systems designed for specific languages and target audiences, and offers available resources for research and development together with text simplification evaluation techniques.

Table of Contents

Readability and Text Simplification
Lexical Simplification
Syntactic Simplification
Learning to Simplify
Full Text Simplification Systems
Applications of Automatic Text Simplification
Text Simplification Resources and Evaluation
Author's Biography

About the Author(s)

Horacio Saggion, Department of Information and Communication Technologies, Universitat Pompeu Fabra
Horacio Saggion is an Associate Professor at the Department of Information and Communication Technologies, Universitat Pompeu Fabra (UPF), Barcelona. He is head of the Large Scale Text Understanding Systems Lab and a member of the Natural Language Processing research group (TALN) where he works on automatic text summarization, text simplification, information extraction, sentiment analysis, and related topics. His research is empirical, combining symbolic, pattern-based approaches, and statistical and machine learning techniques. Horacio is also an active teacher and student supervisor. He holds a Ph.D. in computer science from Universite de Montreal, Canada. He obtained his B.Sc. in computer science from Universidad de Buenos Aires, Argentina, and his M.Sc. in computer science from UNICAMP, Brazil. Before joining Universitat Pompeu Fabra, Saggion worked at the University of Sheffield, UK, for almost ten years for a number national and European research projects developing competitive human language technology in the areas of text summarization and question answering. Saggion was also an invited senior researcher at the Center for Language and Speech Processing, John Hopkins University, USA, for a project on multilingual text summarization. He is currently a principal investigator in a number of national and European research projects in text summarization, text simplification, and information extraction. Saggion has published over 100 works in leading scientific journals, conferences, and books in the field of human language technology. He has organized several international workshops in the areas of text summarization and information extraction and was also co-chair of the 2009 Symposium on Information and Human Language Technology (STIL) and chair of the 30th Conference of the Spanish Society for Natural Language Processing (SEPLN). He is co-editor of the book Multi-source, Multilingual Information Extraction and Summarization, Springer, 2013. He is a regular program committee member for international conferences such as ACL, EACL, COLING, EMNLP, IJCNLP, and IJCAI and is an active reviewer for international journals in computer science, information processing, and human language technology. Saggion has given courses, tutorials, and invited talks at a number of international events including COLING, LREC, ESSLLI, IJCNLP, NLDB, and RuSSIR. He has received a number of grants and fellowships throughout his research career from institutions including Fundacion Antorchas, the Argentinian Ministry of Education, the Canadian Agency for International Development, and the Ramon y Cajal Research Program.

