Talks

PhD Proposal: Assimilating and Assembling the Visual World

Kamal Gupta

4105 Brendan Iribe Center for Computer Science and Engineering (IRB)

Tuesday, November 30, 2021, 9:00-11:00 am

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

In the last few years, there have been a number of works demonstrating the use of deep generative models to create visual data. These models are able to convert sketches into paintings, transfer style from one image to another, or convert an image of semantic labels to realistic scenes, and vice versa. Without a doubt, computational approaches will play a pivotal role in revolutionizing the way we create visual data. Existing approaches in unsupervised or generative modeling, often perform image synthesis in one shot using a black-box model. Arguably humans, on the other hand, follow two quintessential steps in the process of creation - assimilation, and assembly. In this dissertation, we take a deeper look inside each of the two steps and lay the groundwork on how deep generative models can be repurposed to aid humans with each of these steps.

The first part of the thesis focuses on “assimilation”, the process of consuming the data and understanding various constituent components. We draw inspiration from works on the mid-level representation of images that aim to discover concepts in the form of patches that may correspond to objects or parts of objects. In this context, we present - PatchVAE and PatchGame, two complementary approaches to discover such discrete concepts in the image which occur repetitively across the dataset and also compose in different ways to form an image.

The second part of the thesis focuses on “assembly”, the process of synthesizing a meaningful arrangement of some known concepts to give rise to novel scenes sampled from the desired data distribution. Towards this, we first present building blocks for assembling different modalities of visual data: assembling multiple views (Multiview Shapes) and assembling graphical primitives (LayoutTransformer). Lastly, we take a sneak peek into our current work on synthesizing textured meshes and never-ending scenes.

Examining Committee:

Chair:
Department Representative:
Members:

Dr. Abhinav Shrivastava
Dr. Matthias Zwicker
Dr.   Larry Davis
Dr.  David Jacobs
Dr.  Noah Snavely

Bio

Kamal Gupta is a Ph.D. student in the Department of Computer Science at the University of Maryland, College Park. The focus of his research is understanding and recreating the visual world by learning to discover, represent, generate, and assemble primitives from images. This includes discovering discrete concepts in images and describing the images, or scenes as the composition of visual concepts. His long-term goal is to enable storytellers to create intricate 3D visual content seamlessly. He is a recipient of the Dean's Fellowship and the Kulkarni Fellowship.

This talk is organized by Tom Hurst