Challenges in Augmenting Large Language Models with Private Data
Ashwinee Panda
IRB 4105
Abstract
We consider the emerging problem of preventing LLMs from displaying "misaligned" behavior through the lens of robustness, by way of algorithmic stability. We present a set of challenging problems that manifest through the fundamentally opaque nature of training datasets (e.g., because they are proprietary or private) and are amplified by autoregressive generation and massive model capacity. We consider inference-time mitigation strategies that can provide provable guarantees in production systems. The strategies we discuss are general to a wide range of problems in trustworthiness, but we focus on concrete concerns such as private data leakage, jailbreaks, prompt leaking, generating offensive text, etc.
Bio
Ashwinee is a 4th year PhD student at Princeton University working with Prateek Mittal on trustworthy artificial intelligence and privacy preserving machine learning. He received his B.S. and M.S. at the UC Berkeley RISE Lab where he was co-advised by Joey Gonzalez and Raluca Ada Popa, researching federated learning.
This talk is organized by Samuel Malede Zewdu