Topic modeling helps users understand vast document collections when there are too many documents or there is too little time to read individual documents: a journalist processing reports surrounding breaking news or a legal team finding relevant e-mails during discovery. In some cases, it is necessary to bring end users' domain knowledge into topic models; however, modifying topic models often requires an expert understanding of the underlying algorithm. Recent work in human-in-the-loop topic modeling provides mechanisms for end users to provide input to guide topic modeling, such as by specifying words to be added or removed from topics. However, the design focus in topic modeling has been on algorithms rather than users, which has resulted in tools and visualizations that do not support users in understanding and interacting with topic models in desired ways.
To increase the accessibility of topic models, we first focus on designing, implementing, and evaluating novel topic model visualizations to ensure non-expert end users can best understand the output of topic modeling. These informed users can better identify topics that need tweaking and are also more capable of doing so. We improve techniques to better support these users in modifying topics in our second thread of work, that is, making human-in-the-loop topic modeling more user-centric.
First, we performed a two-part user study to determine which visualizations are best for promoting quick understanding of individual topics. This study compared three common topic visualizations, word list, word list with bars, and word cloud, alongside a visualization we developed to display co-occurrence within topics. Additionally, to better support users in performing directed exploration and understanding of an overall topic model, we also designed, implemented, and evaluated an interactive corpus exploration tool based on visualizing changes in topics over time.
Second, to ensure a user-centric approach to our design of a human-in-the-loop topic modeling tool, we implemented a tool with refinements that we had previously identified as being preferred by users. We then conducted a user study to understand how users are affected by interactive machine learning challenges, such as unpredictability and latency. Building on this work, we propose an additional study to determine if the effect (whether positive or negative) of unpredictability and latency is affected by whether the user considers their interaction with the system as being equal teammates or one of control. We additionally propose comparing three refinement implementations for our human-in-the-loop topic modeling tool to determine which best supports efficient and effective curation of topic models.
We anticipate this work will produce the following contributions: human-centered design guidelines for topic model visualizations and interactive topic modeling, novel topic visualizations and an interactive topic modeling tool designed following these guidelines, and insight into teaming effects for interactive machine learning, more broadly.
Co-Chair: Dr. Leah Findlater
Dept. rep: Dr. Mihai Pop