log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
Evaluating Conversational Agents
Wednesday, December 11, 2019, 11:00 am-12:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

There has been a renewed focus on dialog systems, including non-task driven conversational agents. Dialog is a challenging problem since it spans multiple conversational turns. To further complicate the problem, there are many possible valid utterances that may be semantically different. This makes automatic evaluation difficult, which is why the current best practice for analyzing and comparing dialog systems is the use of human judgments. This talk focuses on evaluation, presenting a theoretical framework for the systematic evaluation of open-domain conversational agents, including the usage of Item Response Theory (Lord, 1968) for efficient chatbot evaluation and evaluation set creation.  We introduce ChatEval (https://chateval.orga unified framework for human evaluation of chatbots that augments existing tools and provides a web-based hub for researchers to share and compare their dialog systems.

Bio

João is an assistant research professor at Johns Hopkins University. He received his PhD from the University of Pennsylvania, advised by Lyle Ungar. His research focuses on Natural Language Generation, particularly deep learning methods for non-task driven conversational agents (chatbots) and the evaluation of these models. His research also includes work on word and sentence embeddings, word and verb predicate clustering, and multi-scale models. He is generally interested in Natural Language Processing, Time Series Analysis, and Deep Learning. João was a University of Pennsylvania Fontaine Fellow and a recipient of the 2018 Microsoft Research Dissertation Grant.

This talk is organized by Doug Oard