log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
Human in the Reinforcement Learning Loop
Friday, October 18, 2019, 11:00 am-12:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Reinforcement learning techniques aim to solve complex decision making problems entirely by interaction with an environment, together with an external reward function that signals whether the model's behavior is good or bad. While there have been some astonishing success cases here---winning Atari games, Go, Starcraft, DOTA and more---such successes are incredibly data hungry: they need an unreasonably large number of trials in order to learn. As a result, these techniques are largely limited to fully simulated settings (like games) that can be played, and failed at, millions or billions of times before success. I'll discuss some work in two high-level directions that aim to bring a human into the loop in order to make learning more feasible for lower resource settings. The first is algorithms that give new ways for experts to give "advice" to the learning algorithm; the second is learning techniques that can learn to ask for help from humans in the environment when they need it.

Bio
Hal Daumé III wields a professor appointment in Computer Science and Language Science at the University of Maryland, and spends time as a principal researcher in the machine learning group and fairness group at Microsoft Research in New York City. He and his wonderful advisees study questions related to how to get machines to become more adept at human language, by developing models and algorithms that allow them to learn from data. The two major questions that really drive their research these days are: (1) how can we get computers to learn language through natural interaction with people/users? and (2) how can we do this in a way that promotes fairness, transparency and explainability in the learned models?

 

This talk is organized by Ramani Duraiswami