Talks

PhD Proposal: Hallucinations in Multimodal Large Language Models: Evaluation, Mitigation, and Future Directions

Fuxiao Liu

Remote

Thursday, January 9, 2025, 10:00-11:30 am

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Multimodal Large Language Models (MLLMs) have achieved impressive performance across a wide array of tasks. However, these models are prone to hallucinations that compromise their reliability. This thesis explores the phenomenon of hallucinations in MLLMs, focusing on their identification, underlying causes, and mitigation strategies.

We first propose a systematic evaluation framework to quantify and analyze hallucinations across multiple modalities, leveraging diverse metrics tailored to real-world scenarios. Building on this foundation, we introduce novel mitigation strategies, combining architectural improvements, fine-tuning techniques, and data augmentation approaches to reduce hallucination rates without sacrificing model versatility. Finally, we identify open challenges and outline future research directions. This work provides a comprehensive roadmap for understanding and addressing hallucinations in MLLMs, contributing to the broader goal of enhancing the robustness and reliability of AI systems.

Bio

Fuxiao Liu is a fourth-year PhD in Computer Science at the University of Maryland, College Park, working with Yaser Yacoob, Abhinav Shrivastava, and Tianyi Zhou. His research focuses on developing customizable multimodal large language models that align with human intent. His work has been published in prestigious venues such as ICLR, CVPR, EMNLP, and ACL. Fuxiao has also gained industry experience through internships at leading companies, including NVIDIA, Adobe, Microsoft.

This talk is organized by Migo Gui