Despite the explosion of online content worldwide, much information remains isolated by language barriers. While deep neural network models have dramatically improved the quality of machine translation (MT), truly breaking language barriers requires not only translating accurately, but also comparing what is said and how it is said across languages. In this talk, I will argue that modeling divergences from common assumptions about the data used to train MT systems can not only improve MT, but also help broaden the framing of MT to make it more responsive to user needs. I will first discuss recent work on automatically detecting cross-lingual semantic divergences, which occur when translation does not preserve meaning, and their impact on MT training. Next, I will introduce a training objective for neural sequence-to-sequence models that accounts for divergences between MT model hypotheses and reference human translation. Finally, I will argue that translation does not necessarily need to preserve all properties of the input and introduce a family of models that let us tailor translation style while preserving input meaning.
Marine Carpuat is an Assistant Professor in Computer Science at the University of Maryland. Her research focuses on multilingual natural language processing and machine translation. Before joining the faculty at Maryland, Marine was a Research Scientist at the National Research Council Canada. She received a PhD in Computer Science and a MPhil in Electrical Engineering from the Hong Kong University of Science & Technology, and a Diplome d'Ingenieur from the French Grande Ecole Supelec. Marine is the recipient of an NSF CAREER award, research awards from Google and Amazon, best paper awards at *SEM and TALN, and an Outstanding Teaching Award.