Talks

PhD Proposal: Towards Immersive Visual Content with Machine Learning

Brandon Feng

5105 Brendan Iribe Center for Computer Science and Engineering (IRB)

Wednesday, January 26, 2022, 12:00-2:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Virtual and augmented reality technology is on the cusp of dramatically changing the way we see, learn, and engage with the rest of the world.

This exciting future promises the ability to generate and distribute immersive content to the users. However, the process of transforming data captured using physical cameras into content suitable for immersive experiences still requires overcoming numerous challenges.

In this proposal, I first discuss the problem of recovering depth information from videos captured using 360-degree cameras. Depth information is crucial in creating immersive visual experiences with real-world captured data, because it 1) enables 3D rendering based on the viewer’s position, and 2) allows scene editing effects such as relighting and object insertion. I present a novel method that unifies the representation of object depth and surface normal using double quaternions. I have validated my approach through experimental results that show that training with double-quaternion-based loss function improves the prediction accuracy of a neural network with 360-degree video frames as input.

Next, I discuss the problem of efficiently representing 4D light fields. Light fields have a significant potential for immersive visual applications. An important challenge to their widespread adoption is the extreme cost to store and transmit such high-dimensional data. Building on past research efforts into compressing light field content, I present a novel approach to representing light fields with neural networks.

Unlike prior methods that divide the light field content into patches before encoding each patch separately, my approach treats the light field data as a mapping function between pixel coordinates and color. I further demonstrate the feasibility of training a neural network to accurately learn such a mapping function, and show how embedding the light field pixel coordinates using the Gegenbauer polynomials is crucial for achieving high reconstruction quality. Finally, I show that such a functional representation accomplishes high-quality interpolation and super-resolution on light fields.

I conclude my proposal by giving an overview of some potential ideas on further improving the efficiency of immersive content representation using neural networks.

Examining Committee:

Chair:
Department Representative:
Members:

Dr. Amitabh Varshney
Dr. Furong Huang
Dr. Christopher Metzler

Bio

Brandon Yushan Feng is a PhD student in the Department of Computer Science at the University of Maryland, College Park, advised by Prof. Amitabh Varshney. His research develops new methods for solving problems in computer graphics and image processing, with a special focus on immersive visual content. He received his bachelor's and master's degrees from the University of Virginia.

This talk is organized by Tom Hurst