Talks

Training effective neural cross-language ranking models + Toward more practical complex question answering

Suraj Nair and Chen Zhao - UMD

4105 Brendan Iribe Center for Computer Science and Engineering (IRB)

Wednesday, September 29, 2021, 11:00 am-12:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

This week we will have two presentations from CLIP Lab members. Please see below for their abstracts:

Abstract of "Training effective neural cross-language ranking models"

The advent of transformer-based models (e.g., BERT, RoBERTa) has led to a rise of neural ranking models that have improved the effectiveness of monolingual retrieval systems beyond lexical term matching models such as BM25. The common scenario for using these models is a retrieve-and-rerank setup where an initial system (e.g., BM25) is used to retrieve a set of documents followed by reranking using the trained neural models. However, such pipelines are yet to be fully explored in the case of Cross-Language Information Retrieval (CLIR) where the goal is to retrieve documents expressed in a different language than the query. In this talk, I will first introduce the design choices for building similar retrieve-and-rerank pipelines for CLIR and compare their effectiveness on ad-hoc document ranking tasks.

One downside of such cascaded pipelines is that the recall of the overall system is determined by the lexical matching models which cannot handle the vocabulary mismatch issue. An alternate approach is to use BERT-style models to encode the query and the document separately, commonly known as a dense retrieval model, and then employ matching using a similarity function (e.g., cosine similarity). In the second half of the talk, I will introduce ColBERT-X, a CLIR dense retrieval model built upon an existing monolingual architecture, ColBERT.

Abstract of "Toward more practical complex question answering"

Question answering (QA) is one of the most important and challenging tasks for understanding human language. With the help of large-scale benchmarks, state-of-the-art neural methods have made significant progress to even answer complex questions that require multiple evidence pieces. Nevertheless, training existing SOTA models requires several assumptions (e.g., intermediate evidence annotation, corpus semi-structure) that limit the applicability to only academic testbeds. In this talk, I discuss several solutions to make current QA systems more practical.

I first describe a state-of-the-art system for complex QA, then I introduce a dense retrieval approach that iteratively forms an evidence chain through beam search in dense representations, without using semi-structured information. Finally, I describe a dense retrieval work that focuses on a weakly-supervised setting, by learning to find evidence from a large corpus, and relying only on distant supervision for model training.

Bio

Suraj Nair is a Ph.D. student at the Department of Computer Science advised by Douglas W. Oard. His research focuses on building models for effective and efficient retrieval across languages.

Chen Zhao is a sixth-year PhD candidate at CLIP, co-advised by Prof. Jordan Boyd-Graber and Prof. Hal Daumé III. His research interests lie in question answering, including knowledge representation from large text corpora for complex QA, and semantic parsing over tables.

This talk is organized by Wei Ai