This book addresses the problem of articulatory speech synthesis based on computed vocal tract geometries and the basic physics of sound production in it. Unlike conventional methods based on analysis/synthesis using the well-known source filter model, which assumes the independence of the excitation and filter, we treat the entire vocal apparatus as one mechanical system that produces sound by means of fluid dynamics. The vocal apparatus is represented as a three-dimensional time-varying mechanism and the sound propagation inside it is due to the non-planar propagation of acoustic waves through a viscous, compressible fluid described by the Navier-Stokes equations.
We propose a combined minimum energy and minimum jerk criterion to compute the dynamics of the vocal tract during articulation. Theoretical error bounds and experimental results show that this method obtains a close match to the phonetic target positions while avoiding abrupt changes in the articulatory trajectory. The vocal folds are set into aerodynamic oscillation by the flow of air from the lungs. The modulated air stream then excites the moving vocal tract. This method shows strong evidence for source-filter interaction.
Based on our results, we propose that the articulatory speech production model has the potential to synthesize speech and provide a compact parameterization of the speech signal that can be useful in a wide variety of speech signal processing problems.
Table of Contents
Estimation of Dynamic Articulatory Parameters
Construction of Articulatory Model Based on MRI Data
Vocal Fold Excitation Models
Experimental Results of Articulatory Synthesis
About the Author(s)Stephen Levinson
, University of Illinois at Urbana Champaign
Stephen E. Levinson received his Ph.D. in Electrical Engineering from the University of Rhode Island, Kingston, Rhode Island in 1974. From 1966 - 1969 he was a design engineer at Electric Boat Division of General Dynamics in Groton, Connecticut. From 1974-1976 he held a J. Willard Gibbs Instructorship in Computer Science at Yale University. In 1976, he joined the technical staff of Bell Laboratories in Murray Hill, NJ where he conducted research in the areas of speech recognition and understanding. In 1979, he was a visiting researcher at the NTT Musashino Electrical Communication Laboratory in Tokyo, Japan. He held a visiting fellowship in the Engineering Department at Cambridge University in 1984, and in 1990 he became head of the Linguistics Research Department at AT&T Bell Laboratories where he directed research in Speech Synthesis, Speech Recognition, and Spoken Language Translation. He joined the Department of Electrical and Computer Engineering of the University of Illinois at Urbana-Champaign in 1997, where he teaches courses in Speech and Language Processing and leads research projects in speech synthesis and automatic language acquisition. He is also a full-time faculty member of the Beckman Institute for Advanced Science and Technology where he serves as the head of the Artificial Intelligence group. Dr. Levinson is a member of the Association for Computing Machinery, a fellow of the Institute of Electrical and Electronic Engineers, and a fellow of the Acoustical Society of America. He is a founding editor of the journal Computer Speech and Language and a former member and chair of the Industrial Advisory Board of the CAIP Center at Rutgers University. He is the author of more than 100 technical papers and holds 7 patents. His book, published in 2005 by John Wiley and Sons, Ltd., is entitled Mathematical Models for Speech Technology
.Donald W. Davis, Jr.
, GD/Electric Boat
Donald W. Davis, Jr. received B. S., M. S., and Ph. D. degrees in Aeronautical Engineering from Purdue University in 1970, 1975, and 1981, respectively. Currently, he is a Staff Engineer at Electric Boat Corporation where he works in the area of computational fluid dynamics (CFD). His research interests include fluid mechanics, heat transfer, computational methods, and turbulence modeling. He is also involved in applying advanced CFD tools to large, complex, industrially relevant turbulent flow problems.Scott Slimon
, GD/Electric Boat
Scot A. Slimon received a B.S. in Marine Engineering Systems from the United States Merchant Marine Academy, an M.S. in Mechanical Engineering from the Rensselaer Polytechnic Institute, and a Ph.D. in Mechanical Engineering from the University of Connecticut. Currently, he is a Principal Engineer at Electric Boat Corporation, where he is responsible for the development and application of a computational fluid dynamics solver. His current research involves preconditioning techniques, multiphase flow, hybrid turbulence modeling, and flow induced sound at low Mach numbers. He has applied this research to a number of large-scale external and internal flow problems supporting major Navy submarine platformsJun Huang
Jun Hang is Staff Research Engineer and SoundHound, Inc. in the San Francisco Bay area. He is the resident machine learning (ML) expert and conducts applied research in the areas of ML and automatic speech recognition. He also solves real-world problems such as confidence modeling, pronunciation models for speech recognition, audio segmentation, and lyrics alignment for large scale voice search and music search products.