Talks

The Offline Multi-agent Reinforcement Learning (MARL) Coordination Problem

Paul Barde - McGill University and MILA (Quebec AI Institute)

IRB-5105 Brendan Iribe Center for Computer Science and Engineering (IRB)

Tuesday, October 24, 2023, 4:00-5:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Training multiple agents to coordinate is an important problem with applications in robotics, game theory, economics, and social sciences. However, most existing Multi-Agent Reinforcement Learning (MARL) methods are online and thus impractical for real-world applications in which collecting new interactions is costly or dangerous. While these algorithms should leverage offline data when available, doing so gives rise to the offline coordination problem. Specifically, we identify and formalize the strategy agreement (SA) and the strategy fine-tuning (SFT) challenges, two coordination issues at which current offline MARL algorithms fail. To address this setback, we propose a simple model-based approach that generates synthetic interaction data and enables agents to converge on a strategy while fine-tuning their policies accordingly. Our resulting method, Model-based Offline Multi-Agent Proximal Policy Optimization (MOMA-PPO), outperforms the prevalent learning methods in challenging offline multi-agent MuJoCo tasks even under severe partial observability and with learned world models.

Bio

Paul Barde is a Ph.D. candidate at McGill University and Mila (Quebec AI Institute) co-supervised by Prof. Derek Nowrouzezahrai and Prof. Christopher Pal.

He works on sequential decision-making where he focuses on multi-agent problems such as the emergence of communication, brain-computer interfaces, coordination, and cooperation challenges. When available, he is keen to leverage data and simulators through model-based planning and learning or imitation, inverse and offline reinforcement learning approaches.

Paul's research goal is to leverage data-driven multi-agent sequential decision-making to model intricate problems and assist us with complex decision processes. He is particularly interested in applying agent-based modeling and mechanism design approaches to biodiversity conservation challenges.

During his Ph.D., he has worked at Ubisoft La Forge with Dr. Olivier Delalleau, at INRIA (National Institute for Research in Digital Science and Technology) with Prof. Pierre-Yves Oudeyer, and most recently at FAIR (Meta Fundamental AI Research) with Prof. Amy Zhang.

Note: Please register using the Google Form on our website https://go.umd.edu/marl for access to the Google Meet and talk resources.

This talk is organized by Saptarashmi Bandyopadhyay