Talks

Probably Approximately Precision (Hallucination) and Recall (Mode Collapse) Learning

Han Shao

IRB 0318 (Gannon) or https://umd.zoom.us/j/93754397716?pwd=GuzthRJybpRS8HOidKRoXWcFV7sC4c.1

Friday, October 17, 2025, 11:00 am-12:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Precision and recall are fundamental metrics in machine learning tasks where both accuracy and coverage are essential, including multi-label learning, language generation, medical studies, and recommender systems. In language generation, for example, hallucination reflects a failure of precision, where models output strings outside the true language, while mode collapse reflects a failure of recall, where some valid outputs are never produced. A central challenge in these settings is the prevalence of one-sided feedback, where only positive examples are observed during training. To address learning under such partial feedback, we introduce a Probably Approximately Correct (PAC) framework in which hypotheses are set functions that map each input to a set of labels, extending beyond single-label predictions and generalizing classical binary, multi-class, and multi-label models. Our results reveal sharp statistical and algorithmic separations from standard settings: classical methods such as Empirical Risk Minimization provably fail, even for simple hypothesis classes. We develop new algorithms that learn from positive data alone, achieving optimal sample complexity in the realizable case, and establishing multiplicative—rather than additive—approximation guarantees in the agnostic case, where achieving additive regret is impossible.

Bio

Han Shao is an Assistant Professor of Computer Science. Her research interests span machine learning theory, economics and computation, and algorithmic game theory. She focuses on fundamental questions arising from human social and adversarial behaviors in the learning process, examining how these behaviors shape machine learning systems and developing methods to enhance their accuracy and robustness. She is also broadly interested in foundational problems in learning theory. Previously, she was a postdoctoral fellow at the Center of Mathematical Sciences and Applications at Harvard University, hosted by Cynthia Dwork and Ariel Procaccia. She received her Ph.D. in Computer Science from the Toyota Technological Institute at Chicago, advised by Avrim Blum.

This talk is organized by Samuel Malede Zewdu