Generative models have shown promising capability in image and video synthesis. A pre-trained generative model, e.g., a Generative Adversarial Network (GAN), shows powerful generative prior on downstream tasks, such as image editing and Image Super-Resolution (ISR). However, they have difficulties when one directly applies them to videos, introducing temporal inconsistency or flickering. It is also challenging when it comes to the out-of-domain (OOD) data. In this proposal, we explore to use pre-trained image generative models for their video tasks. We start with video semantic editing task, and propose a flow-based approach to gain the temporal consistency. In addition, to enhance the model’s editability on the OOD data, we then propose to decompose the in-distribution component and out-of-distribution component by leveraging a pre-trained 3D GAN. However, the GANs are typically limited to a specific category, e.g., human faces or animals. To target at more generic scenarios, we present a large-scale Video Super-Resolution (VSR) model, VideoGigaGAN, that produces detail-rich and temporally stable output for generic data, by adapting its image counterpart to videos. We also propose to explore generic obstruction removal and to analyze the high-frequency flickering in diffusion models as a future work.
Yiran Xu is a PhD student advised by Prof. Jia-Bin Huang. His research lies in generative models and their applications, especially video editing and restoration. He used to intern at Adobe and Snapchat. He received his Master's degree from UC San Diego and Bachelor's degree from South China University of Technology.