log in  |  register  |  feedback?  |  help  |  web accessibility
The loss of control of AI: which training paradigms do or don't encourage advanced AI to take over
Michael Cohen
IRB 4105 or https://umd.zoom.us/j/93666933047?pwd=gWgqOgGbBP6laZclyURdDG2mNdArBt.1
Monday, March 30, 2026, 11:00 am-12:00 pm
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Reinforcement learning agents are trained to maximize their long-term reward. This gives them the incentive to secure complete control over their reward, if they can confidently do so. Complete control requires blocking human control. This talk will discuss that issue, how AI companies may be heading toward it, and several ways it can be solved, including pessimism, human imitation, and myopia. Unfortunately, these solutions appear to unavoidably reduce the system's capability. This suggests we will need international coordination to avoid a race to the bottom.

Bio

Michael K. Cohen is a postdoc at the Center for Human-Compatible AI at UC Berkeley and a Scientific Advisor at LawZero. He received his PhD from Oxford and a master's degree from the Australian National University. His research considers how the training processes for artificial agents affect the agents' incentives. He also designs theoretical agents that human operators could likely maintain control over, regardless of the agents' cognitive capabilities.

This talk is organized by Samuel Malede Zewdu