AI has significantly augmented the capabilities of knowledge workers across a range of tasks, such as composing emails, summarizing documents and videos, and analyzing data. While AI systems have reduced much of the manual workload, users still face challenges when interpreting and integrating AI-generated results into their work. We identify two key challenges, framed using the classical human-computer interaction (HCI) concept of the gulf of evaluation: (1) the inherent inaccuracies of AI models, and (2) the lack of meaningful explanations for AI-generated outputs. Bridging these gulfs calls for the development of novel human-centered AI systems.
This dissertation explores techniques to narrow the gulf of evaluation in human-AI interaction. We focus on two domains of knowledge work—content creation and data analysis—and introduce three human-centered AI systems we developed: TutoAI for mixed-media tutorial creation, COALA for event sequence data analysis, and Safeguard AI for machine learning model validation. Through empirical studies with users, we uncover specific insights and extract generalizable lessons for designing human-AI interactions that effectively bridge the gulf of evaluation.
Yuexi Chen is a Ph.D. candidate in the Human-Data Interaction Group, advised by Prof. Leo Zhicheng Liu. Her research focuses on developing human-centered AI systems that go beyond traditional chatbot interactions. She has previously interned at Adobe Research’s Document Intelligence Lab and Bosch Research’s Foundation Model-Powered AI Enablers Group.