Cameras become so accessible for everyone to capture videos spontaneously for each memorable moment in the daily life.
Video editing emerges as an essential process to transform a raw, casually captured video into a sharable visual story.
However, casual videos often intertwine complex scene motion and arbitrary camera motion and thus pose a challenge to some advanced editing tasks, such as camera viewpoint changes and object-level manipulation. While the recent generative models exhibit the powerful capability to take text instructions for video editing, the high-level textual input does not allow fine-grained control, preventing users to adjust their desired outcome effectively.
In this proposal, we explore challenging editing tasks with flexible fine-grained controllability: (1) editing camera viewpoints of a casual video and rendering at novel viewpoints in real life. (2) manipulation dynamic objects in a video by layer decomposition of individual objects and their correlated effects to enable a wide range of object-level editing effects
Yao-Chih Lee is a 4th-year PhD student advised by Prof. Jia-Bin Huang. He has interned at Google DeepMind and Adobe Research. His research primarily focuses on video synthesis and 3D computer vision.

