Modern high-performance computing (HPC) systems operate at massive scales, comprising thousands of nodes equipped with high-end CPUs and GPUs to support complex workloads such as large language model training, quantum simulation, and high-resolution scientific simulations. As these systems continue to scale, two major challenges identified by the U.S. Department of Energy (DOE) become increasingly critical: managing the growing volume of data and ensuring robust error resilience.
My research addresses both challenges by developing flexible, efficient, and broadly applicable software solutions. On the data-efficiency side, I design ultra-fast GPU-based compression frameworks, such as cuSZp, that achieve high compression ratios while preserving data fidelity for diverse applications. On the reliability side, I develop low-overhead fault-tolerance techniques that enable effective detection of complex faults with minimal performance impact. Together, these contributions provide scalable software solutions that improve data efficiency and reliability in next-generation HPC and AI systems.
Yafan Huang is a Ph.D. candidate in the Department of Computer Science at the University of Iowa, advised by Prof. Guanpeng Li. He has been a visiting graduate student at Argonne National Laboratory since 2021, where he works with Dr. Sheng Di and Dr. Franck Cappello. His research focuses on high-performance computing (HPC) and scientific applications, with particular interests in data compression, fault tolerance, parallel computing, and compiler optimizations. Yafan is the recipient of the 2025 ACM–IEEE CS George Michael Memorial HPC Fellowship and has received multiple best paper finalist and award recognitions at system conferences, including SC'22, SC'24, ICS'25, and LDAV'25.

