Talks

PhD Proposal: Advancing Audio Processing in the Age of Large Language Models

Sreyan Ghosh

IRB-4109 Brendan Iribe Center for Computer Science and Engineering (IRB)

Tuesday, February 25, 2025, 2:00-3:30 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Audio understanding is crucial for effective communication and decision-making, yet it has lagged behind language and vision due to data scarcity and the complexity of audio signals. While recent advances in Large Language Models (LLMs) have improved tasks such as Automatic Speech Recognition (ASR), audio captioning, and open-ended question answering, they still struggle with expert-level audio reasoning. Our research addresses this gap by enhancing audio perception and reasoning in LLMs through synthetic data, novel neural architectures, and improved audio representations.

In this talk, I will present our work on enhancing LLMs’ ability to process and reason about audio using better audio representations and synthetic data. I will also discuss our recent advancements in long and complex audio understanding, which are essential for capturing temporal and contextual dependencies in auditory environments. By leveraging these methods, our approach not only strengthens fundamental audio tasks but also pushes the boundaries of expert-level audio reasoning and contextual comprehension in audio-language models.

Bio

Sreyan Ghosh is a PhD student in Computer Science at the University of Maryland, College Park, advised by Professor Dinesh Manocha and Professor Ramani Duraiswami. His research focuses on enhancing audio understanding and reasoning in AI models, with an emphasis on multimodal learning and low-resource learning. His work spans neural architectures, synthetic data generation, improved audio representations, and long-form audio reasoning. He has interned at Adobe, Microsoft, and NVIDIA and is a recipient of the NVIDIA Graduate Fellowship and the Outstanding Graduate Assistant Award.

This talk is organized by Migo Gui