log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
PhD Proposal: Examining the roles of Internally and Externally Learned Priors for Video Modeling
Gaurav Shrivastava
Tuesday, March 26, 2024, 11:00 am-12:30 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

This thesis investigates the role of priors in video modeling. Video Modeling Priors can be categorized as follows: 1) Internally learnt priors - these priors rely solely on the internal statistics of the test video. These types of priors are more prevalent in the field of computational photography. 2) Externally learnt priors - These priors rely on an external training data corpus to learn video dynamics. These types of priors have an added advantage over the internally learnt prior as they are adept at extrapolating tasks hence utilized more in generation tasks. Through this thesis, we explore multiple restoration, enhancement, and generative tasks for videos by formulating either an internally learnt or an externally learnt prior. Through this work, we highlight the success and limitations of our proposed approaches.

In the first part of the thesis, we explore the methods that exploit internal statistics of videos to perform various restoration and enhancement tasks. Here, we show how robustly they perform the restoration tasks like denoising, super-resolution, frame interpolation, and object removal tasks. Furthermore, in a follow-up work, we utilize the inherent compositionality of videos and internal statistics to perform a wider variety of enhancement tasks such as relighting, dehazing, and foreground/background manipulations.

In the second part, we explore generative modeling that exploits the external corpus for learning priors. The task here is of video prediction, i.e., to extrapolate future sequences given a few context frames. In our work, we demonstrate that we are not only able to extrapolate one future sequence from a given context frame but multiple sequences given context frames.

Lastly, we provide insight into our future work where we explore a video prediction model to perform multistep video predictions similar to how the diffusion model works for images.

Bio

Gaurav Shrivastava is a Ph.D. candidate in the Department of Computer Science at the University of Maryland, College Park, advised by Prof. Abhinav Shrivastava. His research broadly encompasses Video Generation, Computational Photography, and Diffusion Models.

This talk is organized by Migo Gui