Talks

PhD Proposal: Towards Enabling GPU Performance Portability

Joshua Davis

IRB-5237 Brendan Iribe Center for Computer Science and Engineering (IRB)

Friday, May 9, 2025, 10:00 am-12:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Supercomputers increasingly rely on a widening range of Graphics Processing Units (GPUs) to provide maximal computing power at minimal energy consumption and deployment cost. As a result, their users need performance portability, or the ability for a single application to run with good performance (e.g., time to solution) across multiple supercomputers. Portable programming models allow a
single piece of code to execute on multiple GPU types through an abstract interface, but nevertheless performance portability remains challenging to achieve. The extent to which each of the many models available actually enable performance portability is not well understood, and the process of converting an existing application codebase to use a new model is time-consuming, making it
difficult to test and compare multiple options.

Furthermore, when a programming model fails to deliver desired performance on a new GPU architecture, identifying ways to improve performance requires expert-level understanding of the hardware and programming model. In this proposal, we focus on these key challenges to performance portability, dividing the problem of performance portability into three stages: planning, implementation, and optimization. We present results of a comparative study which provides a comprehensive overview of how well GPU programming models enable performance portability, along with a novel level of depth in the analysis of results. To address the tedium and time expense of converting an application to a portable programming model, we also present a benchmarking effort which compares agentic and non-agentic translation methods using a range of open-source and commercial large language models (LLMs). We propose additional work to develop an advanced framework for token-efficient full-repository translation using agentic planning and feedback.

We furthermore propose a method to automatically tune portable GPU kernels by deriving performance models from performance optimizations identified by correctness-preserving code mutations, enabling low-overhead and lightweight tuning of GPU kernels on new hardware platforms.

Bio

Joshua (Josh) Davis is a PhD student in the Parallel Software and Systems Group in the Department of Computer Science at the University of Maryland, College Park. Josh is a National Science Foundation Graduate Research Fellow, and the primary subject of his research as a doctoral student is performance portability, or the ability of a single-source application to achieve good performance on a range of hardware platforms. His research interests also incorporate large language model-driven techniques for translation of scientific application software, automated GPU performance analysis and autotuning, and formal verification of the correctness of parallel data structures with model checking. In 2023 Josh's research on evaluating performance portability of GPU programming models won the Best Research Poster award at the ACM/IEEE Supercomputing Conference.

This talk is organized by Migo Gui