The key success of deep learning is the increasing size of models that can achieve high accuracy. At the same time, it is difficult to train the complex models with large data sets. Therefore, it is crucial to accelerate training with distributed systems and architectures, where communication and heterogeneity are two key challenges. In this talk, I will present two heterogeneity-aware decentralized training protocols without communication bottleneck. Specifically, Hop supports arbitrary iteration gap between workers by novel queue-based synchronization which can tolerate heterogeneity with system techniques. Prague uses randomized communication to tolerate heterogeneity with a new training algorithm based on partial reduce —— an efficient communication primitive. Moreover, I will present the systematic tensor partitioning for training on heterogeneous accelerator arrays (e.g., GPU/TPU). We believe that our principled approaches are crucial for achieving high-performance and efficient distributed training.
Xuehai Qian is an assistant professor at University of Southern California. His research interests include domain-specific systems and architectures, performance tuning and resource management of cloud systems, and parallel computer architectures. He got his Ph.D from University of Illinois Urbana Champaign and was a postdoc at UC Berkeley. He is the recipient of W.J Poppelbaum Memorial Award at UIUC, NSF CRII and CAREER Award, and the inaugural ACSIC (American Chinese Scholar In Computing) Rising Star Award.