- You are subscribed to this talk through .
- You are watching this talk through .
- You are subscribed to this talk. (unsubscribe, watch)
- You are watching this talk. (unwatch, subscribe)
- You are not subscribed to this talk. (watch, subscribe)
Conventions are crucial for strong performance in cooperative multi-agent games, because they allow players to coordinate on a shared strategy without explicit communication. Unfortunately, standard multi-agent reinforcement learning techniques, such as self-play, converge to conventions that are arbitrary and non-diverse, leading to poor generalization when interacting with new partners. In this work, we present a technique for generating diverse conventions by (1) maximizing their rewards during self-play while (2) minimizing their rewards when playing with previously discovered conventions (cross-play), stimulating conventions to be semantically different. To ensure that learned policies act in good faith despite the adversarial optimization of cross-play, we introduce mixed-play, where an initial state is generated by sampling self-play and cross-play transitions and the player learns to maximize the self-play reward from this initial state. We analyze the benefits of our technique on various multi-agent collaborative games, including Overcooked, and find that our technique outperforms competing methods when paired with real users.
Andy Shih is a PhD student at Stanford University, advised by Stefano Ermon and Dorsa Sadigh. His research interests include probabilistic inference, generative models, and multi-agent systems. His current focus is on bridging tractable probabilistic inference with the scalability and power of modern deep learning architectures.