Topic models are a useful and ubiquitous tool to discover the main themes (namely topics) of the corpus, and have been successfully applied in various research areas. However, the discovered topics are not always meaningful: some topics confuse two or more themes into one topic; two different topics can be near duplicates; and some topics make no sense at all. For many users in computational social science, digital humanities, and information studies—who are not machine learning experts— existing models and frameworks are often a “take it or leave it” proposition.
This talk presents interactive topic modeling (ITM), a framework allowing untrained users to encode their feedback as correlations between words easily and iteratively into topic models. Because latency in interactive systems is crucial, we develop more efficient inference algorithms for this model, and validate the framework both with simulated and real users. In addition, we also apply this model for domain adaptation in statistical Machine Translation, and preliminarily explore spectral learning methods for this model to further improve the efficiency.
Yuening is a final year Computer Science Ph.D. student working with Jordan Boyd-Graber.