Talks

PhD Defense: Towards practical complex question answering

Chen Zhao

4105 Brendan Iribe Center for Computer Science and Engineering (IRB)

Wednesday, January 12, 2022, 3:00-5:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Question answering (QA) is one of the most important and challenging tasks for understanding human language. With the help of large scale benchmarks, there is tremendous success on building neural QA systems, and such progress has been deployed into commercial systems like search engines. However, most QA systems target at rather simple questions that can be answered within a single evidence piece (e.g., a sentence). In many real scenarios, users also ask complex questions that require multiple evidence pieces, and search engines fail to answer them. The goal of this dissertation work is to tackle complex QA problem from different angles.

We first study complex QA using text collections as a knowledge source. We build two QA systems that rely on a free-text knowledge graph from Wikipedia. Through extracting a question grounded sub-graph and using graph neural network to reason over this graph, the proposed QA systems are state-of-the-art on multiple complex QA benchmarks.

Then we present two solutions to address some key assumptions that make state-of-the-art QA systems difficult to generalize beyond specific benchmarks. We first address the assumption that the given text corpora is semi-structured by hyperlinks. We propose a multi-step dense retrieval method to model the implicit relationships between evidence pieces. The retriever is competitive to state-of-the-arts on complex QA benchmarks, without using any semi-structured information. To further address the assumption that annotated evidence labels are given during training, we focus on the weakly-supervised setting, with only question-answer pairs available. We propose an iterative approach that improves over a weak retriever by alternately finding evidence from the up-to-date model, and encouraging the model to learn the most likely evidence. Without using any evidence labels, our approach is on par with fully-supervised counterparts.

We also study complex QA using tables as a knowledge source. We focus on a practical problem that is dismissed by benchmarks: domain generalization on mathematical operation over columns. We first construct benchmarks to quantify this problem, then we address this problem by incorporating the necessary domain knowledge through table schema preprocessing. Our approach significantly outperforms baselines on this problem, and as a result, boosts the overall performance.

Examining Committee:

Chair:
Co-Chair:
Dean's Representative:
Members:

Dr. Jordan Boyd-Graber /
Dr. Hal Daume III
Dr. Philip Resnik
Dr. Abhinav Srivastava
Dr. Hannaneh Hajishirzi

Bio

Chen Zhao is a sixth-year Ph.D. student advised by Prof. Jordan Boyd-Graber and Prof. Hal Daumé III. His research mainly focuses on question answering and semantic parsing. Zhao will join New York University as a postdoc researcher in Spring 2022.

This talk is organized by Tom Hurst