Talks

PhD Defense: Detecting Fine-Grained Semantic Divergences to Improve Translation Understanding Across Languages

Eleftheria Briakou

4105 or https://umd.zoom.us/j/3651772247?pwd=NUFKS1pReXZvcGxNSHBoakJyVlAxZz09 Brendan Iribe Center for Computer Science and Engineering (IRB)

Friday, June 30, 2023, 2:30-4:30 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

One of the core goals of Natural Language Processing NLP is to develop computational representations and methods to compare and contrast text meaning across languages. Such methods are essential to many NLP tasks, such as question answering and information retrieval. One of the limitations of those methods is the lack of sensitivity to detecting fine-grained semantic divergences, i.e., fine-meaning differences in sentences that overlap in content. Yet, such differences abound even in parallel texts, i.e., texts in two different languages that are typically perceived as exact translations of each other. Detecting such fine-grained semantic divergences across languages matters for machine translation systems, as they yield challenging training samples and for humans, who can benefit from a nuanced understanding of the source.

In this proposal, we focus on detecting fine-grained semantic divergences in parallel texts to improve machine and human translation understanding. In our first piece of work, we start by providing empirical evidence that such small meaning differences exist and can be reliably annotated both at a sentence and at a sub-sentential level. Then, we show that they can be automatically detected by fine-tuning large pre-trained language models without supervision by learning to rank synthetic divergences of varying granularity. In our second piece of work, we turn to analyzing the impact of fine-grained divergences on Neural Machine Translation (NMT) training and show that they negatively impact several aspects of NMT outputs, e.g., translation quality and confidence. Based on these findings, we propose two orthogonal approaches to mitigating the negative impact of divergences and improve machine translation quality: first, we introduce a divergent-aware NMT framework that models divergences at training time; second, we propose generation-based approaches for revising divergences in mined parallel texts to make the corresponding references more equivalent in meaning.

After exploring how subtle meaning differences in parallel texts impact machine translation systems, we switch gears to understand how divergence detection can be used by humans directly. In our last piece of work, we extend our divergence detection methods to explain divergences from a human-centered perspective. We introduce a lightweight iterative algorithm that extracts contrastive phrasal highlights, i.e., highlights of segments indicating where divergences reside within bilingual texts, by explicitly formalizing the alignment between them. We show that our approach produces contrastive phrasal highlights that match human-provided rationales of divergences better than prior explainability approaches. Finally, based on extensive application-grounded evaluations we show that contrastive phrasal highlights help bilingual speakers detect fine-grained meaning differences in human-translated texts, as well as critical errors due to local mistranslations in machine-translated texts.

Examining Committee

Chair:	Dr. Marine Carpuat
Dean's Representative:	Dr. Philip Resnik
Members:	Dr. Hal Daumé
	Dr. Leo Zhicheng Liu
	Dr. Luke Zettlemoyer (University of Washington)

Bio

Eleftheria Briakou is a fifth-year Ph.D. Candidate in the Department of Computer Science at the University of Maryland, College Park. She is a member of the CLIP lab working with Marine Carpuat. Eleftheria’s research interests are broadly in Multilingual Natural Language Processing (NLP) and Machine Translation. Her most recent work focuses on building better models across diverse languages by using humans and AI as joining forces.

This talk is organized by Tom Hurst