Talks

PhD Defense: Learning and Composing Primitives for the Visual World

Kamal Gupta

4105 https://umd.zoom.us/j/9394768822?pwd=Si9VaXExbFZXTDdZZkJzWGRzUW9PZz09 Brendan Iribe Center for Computer Science and Engineering (IRB)

Wednesday, April 12, 2023, 9:00-11:00 am

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Compositionality is at the core of how humans understand and create visual data. In order for the computational approaches to assist humans in creative tasks, it is crucial for them to understand and perform composition. The recent advances in deep generative models have enabled us to convert noise to highly realistic scenes. However, in order to harness these models for building real-world applications, I argue that we need to be able to represent and control the generation process with the composition of interpretable primitives.

In the first half of this talk, I’ll discuss how deep models can discover such primitives from visual data. By playing a cooperative referential game between two neural network agents, we can represent images with discrete meaningful concepts without supervision. I further extend this work for applications in image and video editing by learning a dense correspondence of primitives across images. In the second half, I’ll focus on learning how to compose primitives for both 2D and 3D visual data. By expressing the scenes as an assembly of smaller parts, we can easily perform generation from scratch or from partial scenes as input. I’ll conclude the talk with a discussion of possible future directions and applications of generative models, and how we can better enable users to guide the creative process.

Examining Committee

Chair:	Dr. Abhinav Shrivastava
Dean's Representative:	Dr. Carol Y. Espy-Wilson
Members:	Dr. Larry Davis
	Dr. Matthias Zwicker
	Dr. Noah Snavely

Bio

Kamal Gupta is a Ph.D. candidate at the University of Maryland, College Park. The focus of his research is learning from rich visual data and building efficient, compositional generative models. His long-term goal is to enable storytellers to create intricate visual content seamlessly. He is a recipient of the Dean’s Fellowship and Kulkarni Fellowship. He is a mentor at CVPR Academy, SIGGRAPH Research Career Development Committee, and Iribe Initiative for Inclusion and Diversity. He also worked in the industry for 5 years shipping machine-learning models for low compute devices.

This talk is organized by Tom Hurst