In the realm of computer science, auto-tuning refers to techniques for software performance optimization. The focus of traditional auto-tuning research is to identify novel performance parameters which can expand the optimization space for a given target software/platform combination, and improve the automated search within this optimization space. This makes high-performance computing (HPC) a prime candidate for auto-tuning research, as it sits at the nexus of architectural diversity and performance criticality. However, the major successes for HPC auto-tuning to date involve tailoring memory access patterns to specific cache hierarchies. While important, this is just a small piece of the overall performance portability puzzle.
I argue that auto-tuning has room to expand and optimize a richer set of HPC application tuning parameters through the combination of novel non-intrusive programming language idioms and advanced lightweight online search techniques. I support my argument through four contributions to the field. This dissertation describes two techniques for expanding auto-tuning optimization spaces, and two techniques for distributing the auto-tuning search for parallel efficiency.
Ray Chen is a research scientist in the Cybersecurity department at Peraton Labs, where he currently serves as the principle investigator for several DARPA programs related to dynamic program analysis, binary instrumentation, automated testcase generation, reverse engineering, and vulnerability analysis. Prior to joining Peraton Labs, he (mis)spent his youth as a faculty research assistant in the Computer Science Department at the University of Maryland. Ray remembers those years fondly.