Talks

Making Robust AI Safeguards Run Deep

Stephen Casper

IRB 4105 or https://umd.zoom.us/j/93666933047?pwd=gWgqOgGbBP6laZclyURdDG2mNdArBt.1

Thursday, April 9, 2026, 11:00 am-12:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

In 2025, frontier AI developers started warning that their AI systems were beginning to cross risk thresholds related to cyber, chemical, and biological capabilities. This is unfortunate given how closed-weight AI systems are persistently vulnerable to prompt-injection attacks and open-weight systems are persistently vulnerable to malicious fine-tuning. This presentation will focus on tools for making frontier AI safeguards “run deep.” In particular, we will focus on technical tools for safeguarding open-weight systems. Finally, we will discuss the challenge of making AI safeguards research matter in the real world. Along the way, we will discuss what AI safety can learn from the design of lightbulbs and why you should keep a close eye on Arkansas Attorney General, Tim Griffin, in 2026.

Bio

Stephen “Cas” Casper is a final-semester PhD student at MIT and a Harvard Berkman Klein Fellow. Previously, he was also a research resident at the UK AI Security Institute. He is a writer for the International AI Safety Report and the Singapore Consensus. He also leads a research stream with the MATS program. His work focuses on technical challenges in AI safeguards, evaluations, and risk management, with an emphasis on informing policymakers about best technical practices for managing AI risks. His research has been recognized with a Hoopes Prize, an ML Safety Workshop best paper award, a BioSafeGenAI best paper runner-up, a GenLaw spotlight paper award, a TMLR outstanding paper finalist distinction, and a few dozen mentions in news articles and newsletters. You can find more information on his Google Scholar and website.

This talk is organized by Samuel Malede Zewdu