Multimodal Large Language Models (MLLMs) have achieved impressive performance across a wide array of tasks. However, these models are prone to hallucinations that compromise their reliability. This thesis explores the phenomenon of hallucinations in MLLMs, focusing on their identification, underlying causes, and mitigation strategies.
We first propose a systematic evaluation framework to quantify and analyze hallucinations across multiple modalities, leveraging diverse metrics tailored to real-world scenarios. Building on this foundation, we introduce novel mitigation strategies, combining architectural improvements, fine-tuning techniques, and data augmentation approaches to reduce hallucination rates without sacrificing model versatility. Finally, we identify open challenges and outline future research directions. This work provides a comprehensive roadmap for understanding and addressing hallucinations in MLLMs, contributing to the broader goal of enhancing the robustness and reliability of AI systems.
Fuxiao Liu is a Ph.D. candidate in the Computer Science Department at the University of Maryland, College Park, advised by Abhinav Shrivastava and Yaser Yacoob. His research focuses on developing customizable large multimodal language models that align with human intent. He has published several representative works, including HallusionBench, NVEagle, LRV-Instruction, MMC, and Visual News. After graduation, he will join NVIDIA ADLR as a Research Scientist.