log in  |  register  |  feedback?  |  help  |  web accessibility
PhD Proposal: Adaptive and Efficient Visual Generation of Color, Motion, and Video
Yixuan Ren
Friday, March 28, 2025, 3:00-4:30 pm
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Zoom: https://umd.zoom.us/j/6387044500?pwd=CgvbcCPbywlw1yleeeKe6xpmzreDq4.1

Visual generation has rapidly evolved to synthesize novel or transforming existing images or videos. In recent years, the development of diffusion models have significantly enhanced these capabilities. However, the state-of-the-art models usually features upscaled size and are trained on extensive datasets, resulting in overly generic outputs and significant computational overhead. In this thesis we explore adaptive and efficient methods for image editing and video creation.

We start from presenting a content-adaptive image color editing model with auxiliary color restoration tasks to capture user color preferences. In video generation, we propose to customize pre-trained text-to-video diffusion models on reference motions to adapt them to new subjects and scenarios. We further analyze the spatiotemporal disentanglement properties in pre-trained video diffusion models along the denoising timesteps, leading to fewer modules and training stages. Based on implicit neural representations, we investigate implicit video diffusion models that synthesize videos holistically by generating neural weights, highlighting its compact representation and efficient decoding. In the future, we propose to extend our research to directions including fine-grained motion generation and progressive video frame generation.

Bio

Yixuan Ren is a PhD student in Computer Science at the University of Maryland, College Park. He is advised by Prof. Abhinav Shrivastava. His research lies in image and video synthesis and editing with deep generative models, as well as generative implicit neural representations.

This talk is organized by Migo Gui