log in  |  register  |  feedback?  |  help  |  web accessibility
Auditing Memorization, Dissecting Mechanisms, and Evaluating Behavior of Large Language Models
Wednesday, September 24, 2025, 11:00 am-12:00 pm
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

The widespread adoption of large language models (LLMs) places a responsibility on the AI research community to rigorously study and understand them. In this talk, I will describe my group’s research on analyzing LLMs’ memorization of pre-training data, their internal mechanisms, and their downstream behavior. First, I will introduce the Hubble project, in which we have pre-trained LLMs (up to 8B parameters) on controlled pre-training corpora to understand when and how they memorize sensitive data related to copyright risks, privacy leakage, and test set contamination; we envision these models as a valuable open-source resource for scientific inquiry into LLM memorization. Next, I will describe my group’s work on understanding how language models work internally, including vignettes about how they perform arithmetic with Fourier features and how they can learn optimization subroutines for in-context learning. Finally, I will highlight a recent collaboration with USC oncologists in which we uncover LLM sycophancy issues that arise when patients ask these models for medical advice.

Bio

Robin Jia is an Assistant Professor of Computer Science at the University of Southern California. He received his Ph.D. in Computer Science from Stanford University, where he was advised by Percy Liang. He has also spent time as a visiting researcher at Facebook AI Research, working with Luke Zettlemoyer and Douwe Kiela. He is interested broadly in natural language processing and machine learning, with a focus on scientifically understanding NLP models. Robin’s work has received best paper awards at ACL and EMNLP.

This talk is organized by Wei Ai