Talks

PhD Defense: Improving Knowledge Discovery: Advanced Topic and Language Models

Kyle Seelman

IRB-4107 https://umd.zoom.us/j/6439360497

Friday, November 14, 2025, 12:00-2:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Knowledge discovery in textual data is a cornerstone of natural language processing (NLP), driving innovations that enable machines to uncover, interpret, and interact with human language. This dissertation advances human-centered knowledge discovery by exploring the critical intersection of interpretable topic models and powerful, large-scale language models (LLMS). While topic models offer comprehensible thematic distillation, they often lack semantic richness. Conversely, LLMS encapsulate vast world knowledge but can be opaque and difficult to steer. This work addresses this gap, arguing that the next generation of NLP tools must be designed for human-centered and in-the-loop interaction.

The core proposition of this research is that to be effective and reliable, these two approaches must be synthesized. We demonstrate this by: (1) developing novel topic modeling architectures that allow humans to guide machine representations and (2) investigating how LLMS can, in turn, provide interpretable inferences to guide human decision-making in sensitive domains.

Methodologically, we enhance topic models in two ways. First, we introduce I-NTM, the first architecture for interactive neural topic models. By defining topics as movable embeddings, I-NTM gives users direct control to adjust the topic-word space, allowing them to find more relevant information in less time. Second, we bridge the gap between probabilistic structure and contextual knowledge by using LLMS as priors, demonstrably improving the coherence and adaptability of topic representations for downstream tasks.

We then investigate the complementary direction: applying LLMS to the task of inferring psychological dispositions from text. We find that open-source LLMS can outperform traditional, data-intensive models in a zero-shot setting, particularly in handling outliers. However, our analysis reveals a critical pitfall: this performance exposes a deeper issue, as LLMS often overfit to superficial linguistic cues and annotation biases rather than capturing the nuanced, latent nature of psychological traits.

Finally, we summarize the contributions of each chapter and conclude with a discussion of future directions. This dissertation argues that for NLP systems to be reliable partners in human-centered applications, they require structured human feedback, iterative refinement, and importantly, evaluations that are grounded in real-world human complexity, not just surface-level metrics.

Bio

Kyle Seelman is a PhD student advised by Dr. Jordan Boyd-Graber. His research is focused on human-in-the-loop and human-centered language models.

This talk is organized by Migo Gui