log in  |  register  |  feedback?  |  help  |  web accessibility
PhD Proposal: Transfer Learning in Natural Language Processing through Interactive Feedback
Michelle Yuan
Tuesday, December 22, 2020, 12:00-2:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Machine learning models cannot easily adapt to new data domains and applications. For natural language processing (NLP), this is especially detrimental because language is perpetually changing. As people develop new ideas, written records reflect these innovations. Across the globe, there are thousands of distinct languages due to linguistic and cultural differences. Transfer learning transmits knowledge from source to target settings by modifying model architecture and optimization. This dissertation proposal takes a step further to include a “human in the loop”. If language is a byproduct of human thought, then human feedback should help transfer knowledge for NLP problems. Therefore, our goal is to improve model generalization under low-resource settings through interactive learning.

First, we develop an active learning strategy to annotate examples for text classifiers that have trained on little to no data. State-of-the-art language models learn general text representations from predicting token occurrence over large corpora. Thus, our strategy uses language modeling loss to bootstrap classification uncertainty and sample representative points from surprisal clusters. Next, we refine cross-lingual word embeddings through user feedback for low-resource languages. Bilingual speakers transfer knowledge from English to the target language by aligning the cross-lingual embedding space. Finally, we create a multilingual, interactive topic modeling system for users to refine topics across languages. The user- constructed topic model bridges multilingual gaps in knowledge.

In the proposed work, we plan to explore interactive learning for NLP problems that require a comprehensive understanding of human language. For tasks like coreference resolution and question answering, users can link entities to help the model automate information extraction. Therefore, we will design algorithms and interfaces for users to efficiently transfer knowledge by labeling text spans.

Examining Committee: 
                          Chair:               Dr. Jordan Boyd-Graber               
                          Dept rep:         Dr. John Dickerson
                          Members:        Dr. Rachel Rudinger   

Michelle Yuan is a fourth-year Ph.D. student in the Department of Computer Science, working with Prof. Jordan Boyd-Graber.  Her interests lie in efficiently training machine learning models for problems in natural language processing.

This talk is organized by Tom Hurst