Talks

PhD Proposal: Simulating and Imagining the World with Generative Foundation Models

Jingxi Chen

IRB-3137

Monday, May 11, 2026, 11:00 am-12:30 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Generative foundation models for images and videos are pre-trained on internet-scale data, enabling them to learn broad visual priors for general-purpose generation. However, they are typically conditioned only on text prompts or a single reference image, which limits their applicability to real-world tasks that require richer visual guidance to produce specific, goal-directed outputs. In addition, training such models from scratch is prohibitively expensive for most domain-specific applications.

This proposal investigates how pre-trained generative foundation models can be adapted to real-world image and video tasks through data-efficient adaptation and enhanced visual conditioning. It argues that, with only limited domain-specific data, these general-purpose models can be transformed into effective tools for simulation and imagination, with applications in image and video generation, editing, and robotic simulation.

Bio

Jingxi Chen is a fourth-year CS PhD student. His research focuses on image, video, and multimodal generation, with an emphasis on adapting powerful generative foundation models to real-world applications such as content creation, editing, reconstruction, and robotic simulation. He has interned as a research intern at Dolby and Amazon, and will be an incoming research intern at Google Research. His work has been published at top CV/ML conferences, including CVPR, ICCV, and NeurIPS.

Examining Committee Chair: Dr. John Aloimonos

Department Representative: Dr. Ramani Duraiswami

Members: Dr. Christopher Metzler

This talk is organized by Migo Gui