The rapid adoption of generative AI has created a cycle where personal information cascades perpetually: from people to models to applications and online platforms, then back through scrapers into the system. Simple blanket rules such as "don't train on this data" or "don't share sensitive information" are inadequate, as we face training data scarcity while these models are already deeply integrated into people's daily lives. In this talk, rather than examining data, people, and models in isolation and setting rigid rules, we will reason about their interplay by discussing three research directions: (1) measuring the imprint of data on models through novel membership inference attacks and uncovering memorization patterns, (2) developing algorithmic approaches to help people control the exposure of their data while preserving utility, and (3) grounding model evaluations in legal and social frameworks, particularly the theory of contextual integrity. Looking ahead, we discuss emerging directions in building on-device privacy controls and nudging mechanisms, formalizing semantic memorization, and developing model capabilities such as abstraction, composition, and inhibition to enable controllable generation of outputs.
Niloofar Mireshghallah is a post-doctoral scholar at the Paul G. Allen Center for Computer Science & Engineering at the University of Washington. She received her Ph.D. from the CSE department of UC San Diego in 2023. Her research interests are privacy in machine learning, natural language processing, and generative AI and law. She is a recipient of the National Center for Women & IT Collegiate award in 2020, a finalist of the Qualcomm Innovation Fellowship in 2021, and a recipient of the 2022 Rising Stars in Adversarial ML award and Rising Stars in EECS.