Talks

Demystifying the Inner Workings of Language Models

Sarah Wiegreffe

IRB 4105 or https://umd.zoom.us/j/94340703410?pwd=rrXaGSXSpabcMTtDNmeCNf2Ih2fQYE.1

Wednesday, February 26, 2025, 11:00 am-12:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Large language models (LLMs) power a rapidly-growing and increasingly impactful suite of AI technologies. However, due to their scale and complexity, we lack a fundamental scientific understanding of much of LLMs’ behavior, even when they are open source. The “black-box” nature of LMs not only complicates model debugging and evaluation, but also limits trust and usability. In this talk, I will describe how my research on interpretability (i.e., understanding models’ inner workings) has answered key scientific questions about how models operate. I will then demonstrate how deeper insights into LLMs’ behavior enable both 1) targeted performance improvements and 2) the production of transparent, trustworthy explanations for human users.

Bio

Sarah Wiegreffe is a postdoctoral researcher at the Allen Institute for AI (Ai2) and the Allen School of Computer Science and Engineering at the University of Washington. She has worked on the explainability and interpretability of neural networks for NLP since 2017, with a focus on understanding how language models make predictions in order to make them more transparent to human users. She has been honored as a 3-time Rising Star in EECS, Machine Learning, and Generative AI. She received her PhD in computer science from Georgia Tech in 2022, during which time she interned at Google and Ai2 and won the Ai2 outstanding intern award. She frequently serves on conference program committees, receiving outstanding area chair awards at ACL 2023 and EMNLP 2024. Her website is https://sarahwie.github.io/.

This talk is organized by Samuel Malede Zewdu