log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
PhD Defense: EXPERT-IN-THE-LOOP FOR SEQUENTIAL DECISIONS AND PREDICTIONS
Kianté Brantley
Thursday, December 9, 2021, 9:00-11:00 am Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract
Sequential decisions and predictions are common problems in natural language processes, robotics, and video games. Essentially, an agent interacts with an environment to learn how to solve a particular problem. Research in sequential decisions and predictions has increased due in part to the success of reinforcement learning. However, this success has come at the cost of algorithms being very data inefficient, making learning in the real world difficult.

Our primary goal is to make these algorithms more data-efficient using an expert-in-the-loop (e.g., imitation learning). Imitation learning is a technique for using an expert in sequential decision-making and prediction problems. Naive imitation learning has a covariate shift problem (i.e., training distribution differs from test distribution). We propose methods and ideas to address this issue and address other issues that arise in different styles of imitation learning. In particular, we study three broad areas of using an expert-in-the-loop for sequential decisions and predictions.

First, we study the most popular category of imitation learning, interactive imitation learning. Although interactive imitation learning addresses issues around the covariate shift problem in naive imitation, it does this with a trade-off. Interactive imitation learning assumes access to an online interactive expert, which is unrealistic. Instead, we propose a setting where this assumption is realistic and attempt to reduce the number of queries needed for interactive imitation learning.

We further introduce a new category on imitation learning algorithm called Reward-Learning Imitation learning. Unlike interactive imitation learning, these algorithms address the covariate shift using demonstration data instead of querying an online interactive expert. This category of imitation learning algorithms assumes access to an underlying reinforcement learning algorithm that can optimize a reward function learned from demonstration data. We benchmark all algorithms in this category and relate them to modern structured prediction NLP problems.

Beyond reward-learning imitation learning and interactive imitation, some problems cannot be naturally expressed and solved using these two categories of algorithms. For example, an algorithm that solves a task while satisfying safety constraints. We introduce expert-in-the-loop techniques that extend beyond traditional imitation learning paradigms, where an expert provides demonstration features or constraints instead of state-action pairs.

Examining Committee:
Chair:
Dean's Representative:
Members:
Dr. Hal Daumé III                  
Dr. John Baras
Dr. Tom Goldstein  
Dr. Philip Resnik 
Dr. Geoff Gordon  
Dr. Kyunghyun Cho
Bio

Kianté Brantley is a Ph.D. candidate in computer science advised by Professor Hal Daumé III. Brantley designs algorithms that efficiently integrate domain knowledge into sequential decision-making problems. He is most excited about imitation learning and interactive learning—or, more broadly, settings that involve a feedback loop between a machine learning agent and the input the machine learning agent sees. He has published five first-author conference papers and co-authored three more. He won second place for his talk at the Natural Language, Dialog and Speech Symposium, a leading machine learning conference.

Brantley recently received a prestigious Computing Innovation Fellowship, which will support him for two years as a postdoc at Cornell University. He will study theoretical and practical aspects of learning-to-rank recommendation system problems with Professor Thorsten Joachims. The outcome of their study will be new methodologies with theoretical guarantees and practical benefits for sequential decision-making in recommendation systems.

As a Ph.D. student, Brantley was awarded the competitive Microsoft Research Dissertation Grant, the Association for Computing Machinery’s Special Interest Group on High-Performance Computing/Intel Computational and Data Science Fellowship, the National Science Foundation Louis Stokes Alliances for Minority Participation Bridge to the Doctorate Program Fellowship, the UMD Ann G. Wylie Dissertation Fellowship and the UMD Graduate School’s Dean’s Fellowship. Over the past four summers, he interned for Microsoft Research.

Before coming to UMD in 2016, Brantley attended the University of Maryland, Baltimore County where he earned his bachelor’s degree and master's degree (advised by Tim Oates) in computer science. He also worked as a developer for the U.S. Department of Defense from 2010 to 2017. In his free time, Brantley enjoys playing sports; his favorite sport at the moment is powerlifting. 

This talk is organized by Tom Hurst