As generative AI facilitates human–analytics system interaction, there is growing interest in natural interactions within immersive analytics. Natural interaction inherently entails nuanced communication of intent between humans and the system during data analysis across various input and output modalities. Speech can serve as the central interaction modality in this communication loop, employing its unique properties to enable more natural analytic interaction and to complement other embodied modalities. However, there is a lack of fundamental studies specifically focused on speech as the primary interaction modality. Existing work approaches speech from a relatively narrow perspective, either treating it as merely auxiliary in multimodal interactions or relying primarily on text-based modalities instead. Therefore, this dissertation addresses this problem by defining Speech-driven Immersive Analytics and examining it through three perspectives.
Speech-to-Intent aims to understand the most fundamental factors influencing the communication loop between humans and immersive analytics systems through two studies: EmbodiedNLI (IEEEVIS 2025), which uncovers users' speech patterns and the degree of embodiment reliance expressed in speech; and SIA (ACM Intelligent User Interfaces 2026), which introduces a Speech-driven Immersive Analytics framework. SIA focuses on the local interaction context in the immediate future, guiding users, especially novices, toward their next actions during the foraging phase.
Attention-to-Intent dives deeper into the data reasoning phase to tackle challenges related to working memory load and the loss of reasoning context, especially targeting more experienced users. This study resolves these issues by focusing on the local interaction context in the immediate past, capturing both implicit and explicit signals from users' recent actions that reveal previously unnoticed insights during the data reasoning stage.
Context-to-Intent widens the lens to include long-term and situational context alongside these local interaction contexts, moving toward human–AI co-analysis. We discuss future directions for this research.
Hyemi Song is a fourth-year PhD student whose research defines and investigates Speech-driven Immersive Analytics to enable intelligent, natural interactions between humans and analytics systems in immersive environments, under the supervision of Dr. Amitabh Varshney. Before pursuing her PhD, she worked with several international companies and research institutions as a designer, UX researcher, and research fellow, including Microsoft (Responsible AI, Azure ML, Cognitive Search), Naver (Social Media and Search Engine), and MIT Senseable City Lab (Data Visualization Specialist for Urban Planning), among others. Her academic background spans Computer Science (MS, University of Maryland, College Park), Digital+Media (MFA, Rhode Island School of Design), and Molecular Biology (BS).
Examining Committee Chair: Dr. Amitabh Varshney
Department Representative: Dr. Huaishu Peng
Members:
Dr. Niklas Elmqvist (Aarhus University)
Dr. Kirsten Whitley (Department of Defense)

