log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
Advancing Computational Science using Extreme-Scale Parallel Computing
Abhinav Bhatele
IRB 0318
Friday, September 24, 2021, 3:00-4:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Also on zoom-https://umd.zoom.us/j/97114322433?pwd=TWw0OG8yV3ZTc1d2V0RlYXB6RkNWQT09

Parallel and high performance computing (HPC) have been critical to the advancement of computational science disciplines for several decades now.  Ensuring efficient use of HPC resources is important but challenging due to the increasing complexity of parallel codes and diversity of hardware platforms. In addition, factors such as shared resource contention, which are beyond programmer/end-user control, can also impact performance. In this talk, I will discuss several research directions that have a common goal of improving the performance of parallel software and systems. I will first describe the challenges in designing and implementing a highly scalable, parallel epidemic modeling code, and the benefits of using Charm++, an asynchronous, adaptive, task-based system. I will also discuss the phenomenon of performance variability on HPC systems and approaches to mitigating it. I will present a machine learning model based job scheduler that trains on historical performance data to adapt job scheduling decisions with the aim of reducing performance variability of parallel codes.

Bio

Abhinav Bhatele is an assistant professor in the Department of Computer Science, and director of the Parallel Software and Systems Group at the University of Maryland, College Park. His research interests are broadly in systems and networks, with a focus on parallel computing and big data analytics. He has published research in programming models and runtimes, network design and simulation, applications of machine learning to parallel systems, and on analyzing, modeling and optimizing the performance of parallel software and systems. Abhinav has received best paper awards at Euro-Par 2009, IPDPS 2013 and IPDPS 2016. Abhinav was selected as a recipient of the IEEE TCSC Young Achievers in Scalable Computing award in 2014, the LLNL Early and Mid-Career Recognition award in 2018, and the NSF CAREER award in 2021.

This talk is organized by Richa Mathur