log in  |  register  |  feedback?  |  help  |  web accessibility
Powerful Open-Weight AI Models: Wonderful, Terrible, and Inevitable — How Can We Make Them Safer?
Stephen Casper
IRB 4105 or Zoom https://umd.zoom.us/j/2742814228?pwd=MDI4NFRBZXhjNGlCc3RKUThCTTlDUT09&omn=97071787820&jst=2
Monday, October 27, 2025, 11:00 am-12:00 pm
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Frontier AI models with openly available weights are steadily becoming more powerful and widely adopted. However, compared to proprietary models, open-weight models pose different challenges to effective risk management. There is also relatively little research on safety tooling specific to them. Addressing these gaps will be key to both realizing the benefits and mitigating the harms of open-weight models. This talk will focus on 16 open technical challenges for open-weight model safety involving pretraining, fine-tuning, evaluations, deployment, and ecosystem monitoring. It will discussing the nascent state of the field, emphasizing that openness about research, methods, and evaluations — not just weights — will be key to building a rigorous science of open-weight model risk management.

Bio

Stephen Casper (commonly known as Cas) is a final-year Ph.D. student in Computer Science at the Massachusetts Institute of Technology (MIT), advised by Dylan Hadfield-Menell in the Algorithmic Alignment Group. His research focuses on technical AI safeguards and governance. Cas leads a research stream for the ML Alignment & Theory Scholars (MATS) program and mentors for the ERA Fellowship. He contributes as a writer to the International AI Safety Report and the Singapore Consensus and is supported by the Vitalik Buterin Fellowship from the Future of Life Institute. Previously, he has worked with the Harvard Kreiman Lab, the Center for Human-Compatible AI, and the UK AI Security Institute. https://stephencasper.com/

Hosted by: Furong Huang

This talk is organized by Samuel Malede Zewdu