Talks

From Pixels to Systems: How Generative and Agentic Techniques Reshape Computational Imaging, and Beyond

Zhengzhong Tu

IRB 4105

Wednesday, September 10, 2025, 11:00 am-12:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Large diffusion models have revolutionized text-to-image generation, opening up vast opportunities for conditional tasks such as image editing, restoration, and content creation. Yet, many computational imaging problems, especially those requiring real-time, on-device execution, face challenges in adopting such large-scale generative models. In this talk, I will trace our efforts to bridge this gap. I will begin with efficient architectures for image enhancement (MAXIM, CVPR 2022), and then discuss our recent advances in leveraging pre-trained diffusion models for versatile image restoration. This includes conditional diffusion distillation (CoDi, CVPR 2024) for high-fidelity, accelerated generation, and language-guided control of pre-trained diffusion models for restoration (SPIRE, ECCV 2024). I will also share our award-winning solution for the NTIRE 2025 short-form UGC video enhancement challenge hosted by Kwai. Building on these foundations, I will present our latest work, 4KAgent (Preprint), which demonstrates how the reasoning and planning capabilities of large language models can orchestrate multiple expert restoration tools to achieve highly realistic, universal 4K upscaling for any image type. This represents a step toward a broader paradigm for agentic computer vision, which shifts the focus from isolated model development to system-level intelligence, enabling vision systems that reason, plan, and act in concert to solve more generic and practical problems in the real world.

Bio

Dr. Zhengzhong Tu is an Assistant Professor of Computer Science and Engineering at Texas A&M University since 2024/9. He received his Ph.D. degree from the University of Texas at Austin, TX, USA, in 2022, advised by Cockrell Family Regents Endowed Chair Professor Alan Bovik. Before joining Texas A&M, Dr. Tu worked as an AI researcher at Google Research, where he mainly worked on generative foundation models for on-device applications. Dr. Tu has published in IEEE TPAMI, IEEE TIP, NeurIPS, ICLR, CVPR, ECCV, ICCV, ICRA, WACV, and CoRL, among others. He has co-organized the 2nd/3rd Workshop on LLVM-AD at ITSC 2024/WACV 2025, the 1st Workshop on WDFM-AD at CVPR 2025, the 2nd MetaFood Workshop at CVPR 2024, and the 1st E2E 3D workshop at ICCV 2025. He has received the 1st place winning solution award for the AI4Streaming 2024 Challenge at CVPR 2024 and the 1st place in NTIRE 2025 Short-form UGC VQAE Challenge. He is an Associate Editor of IEEE TIP and Co-Chair of the SOGAI special group in VQEG. He is a recipient of the CVPR 2022 Best Paper Finalist, CVPR 2025 MEIS Workshop Best Paper Award, Google Research Scholar Award 2025, headlining in Google Research Annual Blog, and featuring in Google I/O media outlets.

This talk is organized by Samuel Malede Zewdu