Robust multimodal learning is essential for developing autonomous agents that can operate reliably in complex and uncertain environments. Although recent advances have improved multimodal perception, reasoning, and action, major challenges remain in achieving robustness and generalization across varying domains, agent configurations, and incomplete or noisy sensor data.
This research integrates several complementary advancements to address these challenges. First, it introduces a data-driven optimal control approach that enables agents to manage uncertainty in both multimodal sensor inputs and system dynamics without relying on manually specified prior knowledge. Second, it develops integrated multimodal representations that fuse visual, physical, and temporal information, allowing agents to generalize to novel objects and situations. Third, in multi-agent settings, it employs collaborative learning with knowledge distillation, enabling agents to train with full sensory information while remaining robust when some modalities or agents are absent at deployment. Finally, it proposes an adaptive guidance mechanism that learns effectively from imperfect or noisy supervision, preventing overfitting to unreliable guidance signals.
By unifying optimal control, multimodal representation learning, multimodal multi-agent collaboration, and adaptive guidance, this research aims to develop adaptive and robust autonomous agents capable of operating in real-world, uncertain environments.
Rui Liu is a Ph.D. student in Computer Science at the University of Maryland, College Park, working with Prof. Pratap Tokekar and Prof. Ming Lin. He works on Multimodal Learning, Reinforcement Learning, and Imitation Learning etc. His work spans across Multimodal LLMs Reasoning and Robotics. He aims to build robust autonomous agents.

