As foundation models continue to transform the landscape of artificial intelligence—demonstrating impressive capabilities across vision and language tasks—the need for interpretability and control becomes increasingly critical. My research aims to develop methods for interpreting such models by localizing the knowledge they encode, and leveraging this understanding to enable model editing and machine unlearning. My work spans three major areas: interpretability of vision models, where I propose methods for mapping internal representations to human-understandable concepts and explaining failure modes; knowledge localization and editing in text-to-image generative models, including techniques to identify and modify layers responsible for specific concepts; and machine unlearning in large language models, where I introduce benchmarks and algorithms that improve unlearning efficacy, particularly through the use of intermediate checkpoints. Through these efforts, my research contributes to building more transparent, controllable, and adaptable AI systems.
Keivan Rezaei is a third-year Ph.D. student at the University of Maryland, advised by Prof. Feizi and Prof. Hajiaghayi. His research focuses on the interpretability of generative AI models from both a model perspective, localizing knowledge within models, detecting and explaining their failure modes, and a data perspective, analyzing the impact of individual data points on a model through challenges such as unlearning and data selection for language model pretraining. Furthermore, he has proposed methods for integrating ads into the output of LLMs as a strategy to monetize them effectively.