Talks

PhD Defense: Toward Adaptive and Efficient Visual Synthesis of Appearance, Dynamics and Semantics

Yixuan Ren

IRB-4105 https://umd.zoom.us/my/yxren?pwd=CgvbcCPbywlw1yleeeKe6xpmzreDq4.1

Tuesday, April 14, 2026, 12:15-2:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Generative modeling for visual content has rapidly evolved, enabling the synthesis of novel images and videos as well as the transformation of existing assets. In recent years, diffusion models have played a central role in advancing these capabilities. While training large-scale models on extensive data has led to remarkable quality and surprising generalization, their limited controllability and high complexity remain significant challenges. In this thesis, we investigate adaptive and efficient methods for a range of image editing and video generation tasks.

We begin by studying content-adaptive image color editing, where auxiliary color restoration tasks are introduced to capture users’ chromatic preferences. We then propose the one-shot video motion customization task, which adapts pre-trained text-to-video diffusion models on a single reference video to synthesize the reference motion with novel subjects and scenes. By analyzing spatiotemporal disentanglement along denoising timesteps, we further show that motion customization can be simplified to require no additional components or training procedures. Beyond diffusion in conventional latent spaces, we explore implicit video tokenization and diffusion models based on implicit neural representations, which holistically synthesize videos by generating neural weights and yield compact representations and efficient generation. Finally, we present a noise-free flow-matching framework that directly evolves source image latent toward target image latent for precise and efficient instructional image editing, suggesting a path toward unification across generative paradigms.

Bio

Yixuan Ren is a PhD candidate in Computer Science at the University of Maryland, College Park. He is advised by Prof. Abhinav Shrivastava. His research lies in image and video synthesis and editing with generative models and visual tokenizers.

Examining Committee Chair: Dr. Abhinav Shrivastava

Dean's Representative: Dr. Shuvra Bhattacharyya

Members:

Dr. Ramani Duraiswami

Dr. Matthias Zwicker

Dr. Ruohan Gao

This talk is organized by Migo Gui