log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
Foundations of Multisensory Artificial Intelligence
Paul Liang
IRB 4105 or https://umd.zoom.us/j/95853135696?pwd=VVEwMVpxeElXeEw0ckVlSWNOMVhXdz09
Wednesday, April 10, 2024, 11:00 am-12:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Building multisensory AI systems that learn from multiple sensory inputs such as text, speech, video, real-world sensors, wearable devices, and medical data holds great promise for impact in many scientific areas with practical benefits, such as in supporting human health and well-being, enabling multimedia content processing, and enhancing real-world autonomous agents. In this talk, I will discuss my research on the machine learning principles of multisensory intelligence, as well as practical methods for building multisensory foundation models over many modalities and tasks. In the first half, I will present a theoretical framework formalizing how modalities interact with each other to give rise to new information for a task. These interactions are the basic building blocks in all multimodal problems, and their quantification enables users to understand their multimodal datasets and design principled approaches to learn these interactions. In the second part, I will present my work in cross-modal attention and multimodal transformer architectures that now underpin many of today’s multimodal foundation models. Finally, I will discuss our collaborative efforts in scaling AI to many modalities and tasks for real-world impact on affective computing, mental health, and cancer prognosis.

Bio

Paul Liang is a Ph.D. student in Machine Learning at CMU, advised by Louis-Philippe Morency and Ruslan Salakhutdinov. He studies the machine learning foundations of multisensory intelligence to design practical AI systems that integrate, learn from, and interact with a diverse range of real-world sensory modalities. His work has been applied in affective computing, mental health, pathology, and robotics. He is a recipient of the Siebel Scholars Award, Waibel Presidential Fellowship, Facebook PhD Fellowship, Center for ML and Health Fellowship, Rising Stars in Data Science, and 3 best paper/honorable mention awards at ICMI and NeurIPS workshops. Outside of research, he received the Alan J. Perlis Graduate Student Teaching Award for instructing courses on multimodal ML and advising students around the world in directed research.

This talk is organized by Samuel Malede Zewdu