In its early years, the field of computer vision was largely motivated by researchers seeking computational models of biological vision and solutions to practical problems in manufacturing, defense, and medicine. For the past two decades or so, there has been an increasing interest in computer vision as an input modality in the context of human-computer interaction. Such vision-based interaction can endow interactive systems with visual capabilities similar to those important to human-human interaction, in order to perceive non-verbal cues and incorporate this information in applications such as interactive gaming, visualization, art installations, intelligent agent interaction, and various kinds of command and control tasks. Enabling this kind of rich, visual and multimodal interaction requires interactive-time solutions to problems such as detecting and recognizing faces and facial expressions, determining a person's direction of gaze and focus of attention, tracking movement of the body, and recognizing various kinds of gestures. In building technologies for vision-based interaction, there are choices to be made as to the range of possible sensors employed (e.g., single camera, stereo rig, depth camera), the precision and granularity of the desired outputs, the mobility of the solution, usability issues, etc. Practical considerations dictate that there is not a one-size-fits-all solution to the variety of interaction scenarios; however, there are principles and methodological approaches common to a wide range of problems in the domain. While new sensors such as the Microsoft Kinect are having a major influence on the research and practice of vision-based interaction in various settings, they are just a starting point for continued progress in the area.
In this book, we discuss the landscape of history, opportunities, and challenges in this area of vision-based interaction; we review the state-of-the-art and seminal works in detecting and recognizing the human body and its components; we explore both static and dynamic approaches to "looking at people" vision problems; and we place the computer vision work in the context of other modalities and multimodal applications. Readers should gain a thorough understanding of current and future possibilities of computer vision technologies in the context of human-computer interaction.
Table of Contents
Awareness: Detection and Recognition
Control: Visual Lexicon Design for Interaction
Applications of Vision-Based Interaction
Summary and Future Directions
About the Author(s)Matthew Turk
, University of California, Santa Barbara
Matthew Turk is a professor of Computer Science and former chair of the Media Arts and Technology program at the University of California, Santa Barbara, where he co-directs the UCSB Four Eyes Lab, focused on the "four I's" of Imaging, Interaction, and Innovative Interfaces. He received a B.S. from Virginia Tech, an M.S. from Carnegie Mellon University, and a Ph.D. from the Massachusetts Institute of Technology. Before joining UCSB in 2000, he worked at Microsoft Research, where he was a founding member of the Vision Technology Group in 1994. He is on the editorial board of the ACM Transactions on Intelligent Interactive Systems and the Journal of Image and Vision Computing, and he serves on advisory boards for the ACM International Conference on Multimodal Interaction and the IEEE International Conference on Automatic Face and Gesture Recognition. Prof. Turk was a general chair of the 2006 ACM Multimedia Conference and the 2011 IEEE Conference on Automatic Face and Gesture Recognition and is general chair of the upcoming 2014 IEEE Conference on Computer Vision and Pattern Recognition. He has received several best paper awards, most recently at the 2012 International Symposium on Mixed and Augmented Reality (ISMAR). He is an IEEE Fellow and the recipient of the 2011-2012 Fulbright-Nokia Distinguished Chair in Information and Communications Technologies.Gang Hua
, Stevens Institute of Technology
Gang Hua is an Associate Professor of Computer Science at Stevens Institute of Technology. He also currently holds an Academic Advisor position at IBM T. J. Watson Research Center. He was a Consulting Researcher at Microsoft Research in 2012. Before joining Stevens, he had worked as a full-time researcher at leading industrial research labs for IBM, Nokia, and Microsoft. He received the Ph.D. degree in Electrical and Computer Engineering from Northwestern University in 2006. His research in computer vision studies the interconnections and synergies among the visual data, the semantic and situated context, and the users in the expanded physical world, which can be categorized into three themes: human centered visual computing, big visual data analytics, and vision-based cyber-physical systems. He is on the editorial board of IEEE Transactions on Image Processing and the IAPR Journal of Machine Vision and Applications.