Talks

Machine Learning for Machine-Assisted Data Annotation and Data Exploration

Wednesday, November 5, 2014, 11:00 am-12:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

We are investigating how to effectively use machine learning to provide assistance in the process of expert data annotation and to facilitate data exploration. While labeled data are essential for supervised learning of predictive models in NLP, manual annotation of large data sets can be cost-prohibitive. We reduce annotation cost by offering human annotators various forms of machine assistance designed to increase annotator speed and accuracy and by leveraging their annotations for prediction on larger volumes of data. One aspect of this work focuses on practical cost-conscious active learning (AL) in annotator-initiated environments. Employed properly, AL uses learned models to select data instances that are both informative and inexpensive to annotate. We present results showing that cost-conscious AL can significantly reduce the cost of annotating data in practice. We present a method for eliminating human wait time in AL, independent of specific learning and scoring algorithms, by making scores always available for all instances, using old (stale) scores when necessary. While a human is annotating, the machine is also training models and scoring instances - in parallel - to maximize the recency of the scores. Our "no-wait" method can be seen as a parameter-free, dynamic batch AL algorithm. The performance of our method depends on the relative amounts of time required for annotating, training, and scoring. Time permitting, I will also present some new research involving the inference of labels from multiple, fallible annotators.

Joint work with: Robbie Haertel, Paul Felt, and Kevin Seppi

Bio

Eric Ringger is an Associate Professor of Computer Science at Brigham Young University. He is director of the Natural Language Processing (NLP) Lab and is working with his students to solve the problem of machine-assisted exploratory textual data analysis. Their research toward solving that problem makes contributions in areas such as NLP, text mining with unsupervised topic models, text analytics, lightly supervised machine learning -- including cost-conscious active learning -- and machine-assistance for human language annotation tasks. Data of interest include large text collections and historical document images. He teaches courses on algorithm design and analysis, NLP, and text mining. He was a Researcher in the NLP group at Microsoft Research in Redmond, Washington, from 1997 to 2005. Eric has a B.S. in Mathematics from BYU and M.S. and Ph.D. degrees in Computer Science from the University of Rochester. Up-to-date information regarding his publications can be found on Google Scholar: http://scholar.google.com/citations?user=XkkU-hwAAAAJ&hl=en

This talk is organized by Jimmy Lin