Talks

Learning from Speech Production for Improved Recognition

Karen Livescu - Toyota Technological Institute at Chicago

Friday, January 31, 2014, 11:00 am-12:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Ideas from speech production research have motivated several lines of work in the speech recognition research community. Unfortunately, our understanding of speech articulation is still quite limited, and articulatory measurement data is scarce. How can we take advantage of the potential usefulness of speech production, without relying too much on noisy information?

This talk will cover recent work exploring this area, with the theme of using machine learning ideas to automatically infer information where our knowledge and data are lacking. The talk will describe new techniques for deriving improved acoustic features using articulatory data in a multi-view learning setting. The techniques here are based on canonical correlation analysis and its nonlinear extensions, including our recently introduced extension using deep neural networks. Time permitting, the talk will also cover recent work using no articulatory data at all, but treating articulatory information as hidden variables in models for lexical access and spoken term detection.

Bio

Karen Livescu is an Assistant Professor at TTI-Chicago, where she has been since 2008. Previously she completed her PhD at MIT in the Spoken Language Systems group of the Computer Science and Artificial Intelligence Laboratory, and was a post-doctoral lecturer in the MIT EECS department. Karen's main research interests are in speech and language processing, with a slant toward combining machine learning with knowledge about linguistics and speech science. Her recent work has included multi-view learning of speech representations, articulatory models of pronunciation variation, discriminative training with low resources for spoken term detection and pronunciation modeling, and automatic sign language recognition. She is a member of the IEEE Spoken Language Technical Committee, an associate editor for IEEE Transactions on Audio, Speech, and Language Processing and subject editor for Speech Communication, and an organizer/co-organizer of a number of recent workshops, including the ISCA SIGML workshops on Machine Learning in Speech and Language Processing, the Midwest Speech and Language Days, and the Interspeech Workshop on Speech Production in Automatic Speech Recognition.

This talk is organized by Jimmy Lin