log in  |  register  |  feedback?  |  help  |  web accessibility
PhD Defense: Communication-efficient Hybrid Parallel Algorithms for Neural Network Training
Siddharth Singh
Friday, March 28, 2025, 10:10 am-11:30 pm
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Deep learning has made significant advancements across various fields, driven by increasingly larger neural networks and massive datasets. However, these improvements come at the cost of high computational demands, necessitating the use of thousands of GPUs operating in parallel for model training. At this scale, the overhead associated with inter-GPU communication becomes a major bottleneck, severely limiting efficient hardware resource utilization.

This thesis addresses the challenge of communication in large-scale parallel deep learning. First, it introduces a novel four-dimensional hybrid parallel algorithm designed to minimize communication overhead while maintaining ease of use for practitioners. Second, it presents a topology-aware communication model that identifies optimal configurations for this algorithm based on the hardware architecture, improving efficiency and scalability. Finally, the thesis develops highly scalable implementations of collective communication primitives commonly used in distributed deep learning, further enhancing performance.

By tackling these critical communication challenges, this work contributes to more efficient deep learning training at scale, enabling faster model convergence and better resource utilization across large GPU clusters.

Bio

Siddharth Singh is a fifth-year Ph.D. candidate in Computer Science at the University of Maryland, College Park. He earned his B.Tech and M.Tech in Computer Science and Engineering from the Indian Institute of Technology, Kharagpur. Advised by Prof. Abhinav Bhatele, his research focuses on the practical aspects of distributed training and inference for large neural networks. In the 2023-24 academic year, he received the Outstanding Graduate Research Assistant Award and led a team to the finals of the ACM Gordon Bell Competition.

This talk is organized by Migo Gui