Reinforcement learning agents are trained to maximize their long-term reward. This gives them the incentive to secure complete control over their reward, if they can confidently do so. Complete control requires blocking human control. This talk will discuss that issue, how AI companies may be heading toward it, and several ways it can be solved, including pessimism, human imitation, and myopia. Unfortunately, these solutions appear to unavoidably reduce the system's capability. This suggests we will need international coordination to avoid a race to the bottom.
Michael K. Cohen is a postdoc at the Center for Human-Compatible AI at UC Berkeley and a Scientific Advisor at LawZero. He received his PhD from Oxford and a master's degree from the Australian National University. His research considers how the training processes for artificial agents affect the agents' incentives. He also designs theoretical agents that human operators could likely maintain control over, regardless of the agents' cognitive capabilities.

