Intelligence is characterized by the ability to understand and reason about the world around us. While deep learning has excelled at pattern recognition tasks such as image classification and object recognition, it falls short of deriving the true understanding necessary for complex reasoning and physical interaction. In this talk, I will introduce a framework, neuro-symbolic embodied AI, for bringing intelligence to immersive media. This framework aims to reduce the gap between machine and human intelligence in terms of data efficiency, flexibility, and generalization. My approach combines the ability of neural networks to extract patterns from data, symbolic programs to represent and reason from prior knowledge, and physics engines for inference and planning. Together, they form the basis for enabling machines that can effectively reason about underlying objects and their associated dynamics, as well as master new skills efficiently and flexibly. I will conclude my talk by introducing the ThreeDWorld platform, a multi-modal interactive world simulator, and highlighting its potential for promoting research and education in immersive media design.
Chuang Gan is a principal research staff member at MIT-IBM Watson AI Lab. He is also a visiting research scientist at MIT, working closely with Prof. Antonio Torralba and Prof. Josh Tenenbaum. Before that, he completed his Ph.D. with the highest honor at Tsinghua University, supervised by Prof. Andrew Chi-Chih Yao. His research interests sit at the intersection of computer vision, machine learning, and robotics. His research works have been recognized by Microsoft Fellowship, Baidu Fellowship, and media coverage from BBC, WIRED, Forbes, and MIT Tech Review. He has served as an area chair of CVPR, ICCV, ECCV, ICML, ICLR, NeurIPS, ACL, and an associate editor of IEEE Transactions on Image Processing.