Talks

PhD Defense: Enhancing Machine Learning through Data-Centric Approaches: Efficiency, Generalization, and Trustworthiness

Mucong Ding

IRB-4107

Thursday, November 6, 2025, 10:00 am-12:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

My doctoral research investigates Data-Centric AI, exploring how a principled focus on the data pipeline can address persistent challenges in modern machine learning. This approach recognizes that systematically improving data quality, utility, and evaluation is a powerful method for enhancing model efficiency, generalization, and trustworthiness.
My work introduces several techniques to implement this data-centric philosophy. To improve efficiency and scalability, I developed methods for sample-efficient Graph Neural Network training using vector quantization, sketching, and coreset selection; explored calibrated dataset condensation for accelerating hyperparameter search; and investigated graphical models to improve the training stability of Generative Adversarial Networks. To enhance trustworthiness, I established WAVES, a benchmark for stress-testing invisible image watermarks. This research culminated in a NeurIPS 2024 competition that benchmarked community-developed techniques, revealing their practical strengths and weaknesses. Finally, to improve generalization, I developed Easy2Hard-Bench for profiling LLM reasoning with standardized difficulty labels, and SAIL, a self-improving online framework for data-efficient LLM alignment.

Bio

Mucong Ding is a PhD candidate in Computer Science at the University of Maryland, advised by Dr. Furong Huang. His research in Data-Centric AI aims to build more reliable and scalable machine learning systems by focusing on the quality and efficiency of the data pipeline.

This talk is organized by Migo Gui