Talks

Paradigms of AI alignment: components and enablers

IRB-5165 Brendan Iribe Center for Computer Science and Engineering (IRB)

Tuesday, November 21, 2023, 1:00-2:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

The goal of AI alignment is to figure out how to get advanced AI systems to do what we want them to do and not knowingly act against our interests. Alignment research is focused either on developing different components of an aligned AI system (e.g. reward design and generalization) or enabling more effective work on the components (e.g. through improving interpretability or theoretical understanding). This talk will give an overview of research directions in each of these areas and how alignment research at Google DeepMind fits into this framework.

Bio

Victoria is a senior research scientist on the Alignment team at Google DeepMind. She is currently focusing on evaluating dangerous capabilities in large language models. Her past research includes power-seeking incentives, specification gaming, and avoiding side effects. She has a PhD in statistics and machine learning from Harvard University.

Note: Please register using the Google Form on our website https://go.umd.edu/marl for access to the Google Meet and talk resources.

This talk is organized by Saptarashmi Bandyopadhyay