- You are subscribed to this talk through .
- You are watching this talk through .
- You are subscribed to this talk. (unsubscribe, watch)
- You are watching this talk. (unwatch, subscribe)
- You are not subscribed to this talk. (watch, subscribe)
We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.
Jack Parker-Holder (email, website, scholar) is a Research Scientist at Google DeepMind in the Open-Endedness Team and an Honorary Lecturer in Computer Science at University College London. His work focuses on training generative world models from Internet videos. Prior to DeepMind he completed his DPhil at the University of Oxford, supervised by Prof. Stephen Roberts where he spent time as an intern at FAIR and Aspect Capital. Previously, he was a Vice President at JPMorgan Chase in New York, where he also studied for a Master's degree at Columbia University. Jack completed his undergraduate degree in Mathematics at the University of Exeter. He has previously organized workshops including ALOE at ICLR 2022 and NeurIPS 2023, the EGG workshop at RSS 2024 and the workshop on Foundation Models for Decision Making at NeurIPS 2022.
Note: Please register using the Google Form on our website https://go.umd.edu/marl for access to the Google Meet, Open-source Multi-Agent AI Research Community and talk resources.