log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
PhD Defense: Learning and Composing Primitives for the Visual World
Kamal Gupta
Wednesday, April 12, 2023, 9:00-11:00 am Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract
Compositionality is at the core of how humans understand and create visual data. In order for the computational approaches to assist humans in creative tasks, it is crucial for them to understand and perform composition. The recent advances in deep generative models have enabled us to convert noise to highly realistic scenes. However, in order to harness these models for building real-world applications, I argue that we need to be able to represent and control the generation process with the composition of interpretable primitives.

In the first half of this talk, I’ll discuss how deep models can discover such primitives from visual data. By playing a cooperative referential game between two neural network agents, we can represent images with discrete meaningful concepts without supervision. I further extend this work for applications in image and video editing by learning a dense correspondence of primitives across images. In the second half, I’ll focus on learning how to compose primitives for both 2D and 3D visual data. By expressing the scenes as an assembly of smaller parts, we can easily perform generation from scratch or from partial scenes as input. I’ll conclude the talk with a discussion of possible future directions and applications of generative models, and how we can better enable users to guide the creative process.
 
Examining Committee

Chair:

Dr. Abhinav Shrivastava

Dean's Representative:

Dr. Carol Y. Espy-Wilson

Members:

Dr. Larry Davis

 

Dr. Matthias Zwicker

 

Dr. Noah Snavely

Bio

Kamal Gupta is a Ph.D. candidate at the University of Maryland, College Park. The focus of his research is learning from rich visual data and building efficient, compositional generative models. His long-term goal is to enable storytellers to create intricate visual content seamlessly. He is a recipient of the Dean’s Fellowship and Kulkarni Fellowship. He is a mentor at CVPR Academy, SIGGRAPH Research Career Development Committee, and Iribe Initiative for Inclusion and Diversity. He also worked in the industry for 5 years shipping machine-learning models for low compute devices.

This talk is organized by Tom Hurst