My doctoral research investigates Data-Centric AI, exploring how a principled focus on the data pipeline can address persistent challenges in modern machine learning. This approach recognizes that systematically improving data quality, utility, and evaluation is a powerful method for enhancing model efficiency, generalization, and trustworthiness.
My work introduces several techniques to implement this data-centric philosophy. To improve efficiency and scalability, I developed methods for sample-efficient Graph Neural Network training using vector quantization, sketching, and coreset selection; explored calibrated dataset condensation for accelerating hyperparameter search; and investigated graphical models to improve the training stability of Generative Adversarial Networks. To enhance trustworthiness, I established WAVES, a benchmark for stress-testing invisible image watermarks. This research culminated in a NeurIPS 2024 competition that benchmarked community-developed techniques, revealing their practical strengths and weaknesses. Finally, to improve generalization, I developed Easy2Hard-Bench for profiling LLM reasoning with standardized difficulty labels, and SAIL, a self-improving online framework for data-efficient LLM alignment.
Mucong Ding is a PhD candidate in Computer Science at the University of Maryland, advised by Dr. Furong Huang. His research in Data-Centric AI aims to build more reliable and scalable machine learning systems by focusing on the quality and efficiency of the data pipeline.

