Talks

PhD Proposal: Generating Visual Content: From Pixel Orders to Videos

Hanyu Wang

IRB-4109 Brendan Iribe Center for Computer Science and Engineering (IRB)

Saturday, January 27, 2024, 9:30-11:00 am

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Visual content generation is a fundamental challenge in computer vision that enables diverse applications across domains. The high-dimensional nature of visual data makes it particularly challenging to achieve both quality and precise control in generation tasks. This thesis investigates visual generation across modalities, ranging from pixel-level ordering to complex spatiotemporal data synthesis.

We begin by addressing the foundational challenge of sequentially representing visual data through Neural Space-filling Curves, a data-driven approach that learns context-aware pixel orderings optimized for downstream tasks such as LZW compression. We then explore controlled image generation through two complementary approaches: Chop & Learn, a framework for compositional generation that enables synthesis of novel object-state combinations, and a multimodal style transfer method that effectively combines guidance from both images and text. For video generation, we introduce LARP, a novel tokenization approach with a learned autoregressive prior that achieves state-of-the-art performance while maintaining computational efficiency. Finally, we propose directions for future research that focus on two key areas: advancing latent visual diffusion models and adapting LLMs for high-fidelity visual generation.

Bio

Hanyu Wang is a PhD student in Computer Science at the University of Maryland, College Park, where he is advised by Prof. Abhinav Shrivastava. He holds a B.Eng. in Computer Science and Technology from Xi’an Jiaotong University and an M.S. in Computer Science from the University of Maryland. His research focuses on visual content generation.

This talk is organized by Migo Gui