https://umd.zoom.us/j/
Recent rapid advancements in Artificial Intelligence (AI) have made it widely applicable across various domains, from autonomous systems to multimodal content generation. However, these models remain susceptible to significant security and safety vulnerabilities. Such weaknesses can enable attackers to jailbreak systems, allowing them to perform harmful tasks or leak sensitive information. As AI becomes increasingly integrated into critical applications like autonomous robotics and healthcare, the importance of ensuring AI safety is growing. Understanding the vulnerabilities in today’s AI systems is crucial to addressing these concerns.
In this talk, I will explore key aspects of AI safety and security, with a focus on robustness, detectability, and data privacy. We will analyze the challenges posed by adversarial attacks, contributing insights toward enhancing the security and reliability of AI systems.
In the first part of the talk, I will discuss our findings on the challenges of detecting AI-generated content. I will present our results on attacks that successfully evade and spoof current AI content detectors and highlight our theoretical analysis of the inherent difficulty of detectability. Following this, I will introduce our work on fast adversarial attacks against large language models (LLMs). I will explain how our beam search-based algorithm can efficiently jailbreak, induce hallucinations, and perform privacy attacks on LLMs.
Next, I will address privacy concerns stemming from the widespread use of big data by showcasing our work on creating unlearnable datasets that protect user data from unauthorized model training. In the final part of the talk, I will outline future research directions in AI content detection and discuss new approaches for multimodal AI jailbreaking.
Vinu Sankar Sadasivan is a fourth-year PhD student in Computer Science at the University of Maryland, College Park, where he is advised by Prof. Soheil Feizi. His research focuses on Security and Privacy in AI, with an emphasis on robustness, detectability, and privacy. He is a 2023 recipient of the Kulkarni Fellowship.