log in  |  register  |  feedback?  |  help  |  web accessibility
CLIP Research Highlights: Embedding Senses and Reinforcement Learning with no Incremental Feedback
Fenfei Guo and Amr Sharaf - University of Maryland
Wednesday, March 14, 2018, 11:00 am-12:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
This week, we'll hear about two research projects by CLIP PhD students.
Title: Inducing and Embeddings Senses with Scaled Gumbel Softmax
Speaker: Fenfei Guo
Abstract: We propose an unsupervised model that simultaneously learns 1) interpretable sense embeddings and 2) how to select which sense to use in a given context. It uses a modified Gumbel softmax function for differentiable discrete sense selection.  Our model not only produces sense embeddings that are competitive on multiple downstream evaluations but also discovers multiple interpretable distinct sense groups per word and achieves state-of-art results on human evaluation task. 
Title: Residual Loss Prediction: Reinforcement Learning with no Incremental Feedback
Speaker: Amr Sharaf

State of the art learning-based systems require enormous, costly datasets on which to train supervised models. To progress beyond this requirement, we need learning systems that can interact with their environments, collect feedback, and improve over time. In most real-world settings, such feedback is sparse and delayed.
Abstract: In this talk, I'll present Residual Loss Prediction (RESLOPE) a novel learning algorithm that solves reinforcement learning and bandit structured prediction problems with very sparse feedback. RESLOPE learns an internal representation of a denser reward function and operates as a reduction to contextual bandits. It uses its learned loss representation to solve the credit assignment problem, and a contextual bandit oracle to trade-off exploration and exploitation. RESLOPE enjoys a no-regret reduction style theoretical guarantee and outperforms state of the art reinforcement learning algorithms in both MDP environments and bandit structured prediction settings.
Fenfei Guo is a Ph.D. student working with Jordan Boyd-Graber. Her research interest lies in interpretable representation learning for languages. And she has been working on multi-sense word embeddings, user interactive sense embeddings and tracking topics in conversations. 

Amr Sharaf is a Ph.D. student in the Computational Linguistics and Information Processing (CLIP) Lab of the Department of Computer Science at the University of Maryland, advised by Hal Daumé III. His research focuses on developing interactive learning algorithms in the context of structured prediction for artificial intelligence and natural language processing. 



Amr Sharaf is a Ph.D. student working with Hal Daume III.

This talk is organized by Marine Carpuat