log in  |  register  |  feedback?  |  help  |  web accessibility
Interpretability as the Inverse Machine Learning Pipeline
Sarah Wiegreffe
IRB 0318 (Gannon) or https://umd.zoom.us/j/93754397716?pwd=GuzthRJybpRS8HOidKRoXWcFV7sC4c.1
Friday, November 14, 2025, 11:00 am-12:00 pm
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Language models (LMs) power a rapidly-growing and increasingly impactful suite of AI technologies. However, due to their scale and complexity, we lack a fundamental scientific understanding of much of LLMs’ behavior, even when they are open source. In this talk, I will describe some of our recent work on interpreting LMs through the lens of the classical machine learning pipeline. This includes 1) working backwards from behavioral analysis and explanation generation as a form of model evaluation, 2) interpreting model internals post-training, 3) understanding model training dynamics, and ultimately 4) attributing model behavior back to the training data, with the goal to build better training corpora for future LMs.

Bio

Sarah Wiegreffe is a natural language processing and machine learning researcher and an assistant professor in the Department of Computer Science  at the University of Maryland. She works on the explainability and interpretability of deep learning systems for language, focusing on understanding how language models make predictions to make them more reliable, safe and transparent to human users. She has been honored as a three-time Rising Star in EECS, Machine Learning, and Generative AI. She was previously a postdoc at the Allen Institute for AI and the University of Washington, and before that, she received her Ph.D. and M.S. degrees from Georgia Tech.

This talk is organized by Samuel Malede Zewdu