In this talk, I will share how my research explores a major shift in computer vision: from static, model-driven systems to interactive frameworks guided by user input. My work focuses on both visual and textual prompting strategies that make models more effective, intuitive, and adaptable.
First, I will present two visual prompting methods: SimpSON, which enables segmentation of multiple similar objects with a single click, and MaGGIe, which uses coarse masks to handle ambiguity in instance matting for multi-person scenes. Both approaches emphasize minimal user input, fast performance, and generalization across diverse scenarios.
Then, I will dive into interactive textual prompting with CoLLM, a language-driven retrieval framework that captures complex user intent without relying on manually annotated triplets. I will also introduce new datasets generated via large language models to support this line of work.
Overall, my research highlights how user interactions, both visual and textual, can transform vision models into more precise, efficient, and user-friendly systems.
Chuong Huynh is a fourth-year Ph.D. student in Computer Science at the University of Maryland, College Park, where he is advised by Professor Abhinav Shrivastava. His research focuses on interactive computer vision, exploring how user input, whether visual or textual, can drive deeper and more adaptive image understanding. Passionate about building intelligent systems that collaborate with people, Chuong has applied his work in real-world settings through research internships at Adobe and Amazon Rufus, where he contributed to advancing user-centric AI technologies.