Current HPC environments and applications are rather rigid and inflexible, and MPI’s inability to efficiently support malleability, i.e., the ability to grow and shrink the computational resources associated with a job at runtime, is a significant part of the problem. While this is likely not going to change for exascale systems anymore, in a Post-Exascale world, however, we will require a more flexible approach, e.g., to support a greater level of fault tolerance, to adjust to changing levels of available resources, or to match more complex workflows. In order for MPI to maintain its dominant role in HPC, it will have to change and become more adaptive. In this talk I will discuss the challenges facing MPI in these scenarios as well as several approaches that are first steps towards supporting malleability in MPI. They will open the door for MPI to both support a new generation of applications as well as to provide more flexible runtime support for higher level programming models.
Martin Schulz is a Full Professor and Chair for Computer Architecture and Parallel Systems at the Technische Universität München (TUM), which he joined in 2017, as well as a member of the board of directors at the Leibniz Supercomputing Centre. Prior to that, he held positions at the Center for Applied Scientific Computing (CASC) at Lawrence Livermore National Laboratory (LLNL) and Cornell University. He earned his Doctorate in Computer Science in 2001 from TUM and a Master of Science in Computer Science from UIUC. Martin has published over 200 peer-reviewed papers and currently serves as the chair of the MPI Forum, the standardization body for the Message Passing Interface. His research interests include parallel and distributed architectures and applications; performance monitoring, modeling and analysis; memory system optimization; parallel programming paradigms; tool support for parallel programming; power-aware parallel computing; and fault tolerance at the application and system level. Martin was a recipient of the IEEE/ACM Gordon Bell Award in 2006 and an R&D 100 award in 2011.