PhD Defense: An efficient neural representation for videos
As the popularity of videos increases, it becomes crucial to find efficient and compact ways of representing them to facilitate their storage, transmission, and downstream video tasks. During his Ph.D, Hao introduced an innovative neural representation for videos called NeRV, in which each video is stored implicitly as a neural network. Building on NeRV, he proposed a hybrid representation for videos (HNeRV) resulting in improved internal generalization and representation capacity. It allows for highly efficient video representation and compression, with a model size that is up to 1000 times smaller than the original raw video. Besides efficiency, HNeRV’s simple decoding process - a feedforward operation - enables fast video loading and easy deployment. Consequently, we developed an efficient neural video dataloader (NVLoader) that is 3-6 times faster than conventional video dataloaders. To address encoding speed, we introduced the HyperNeRV framework, which uses a hypernetwork to directly map input videos to NeRV model weights, speeding up the encoding process by 10^4 times. Aside from developing compact and implicit video neural representations, we explore several compelling applications based on them, such as frame interpolation, video restoration, and video editing. Moreover, the compactness of these representations makes them an ideal output video format that significantly reduces the search space or an efficient input for video understanding models.
Hao Chen is a PhD candidate from University of Maryland, College Park, advised by Abhinav Shrivastava. He received the Master's and bachelor's degree from Huazhong University of Science and Technology (HUST).
This talk is organized by Tom Hurst