Talks

Provably Efficient and Scalable Shared-Memory Graph Processing

Laxman Dhulipala

https://umd.zoom.us/j/94543765116?pwd=clY3MVV5Z1g4T2xpdnJMdjFiMFhYdz09

Tuesday, March 2, 2021, 1:00-2:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Graph processing is a fundamental tool in many computational disciplines due to the widespread availability of graph data. However, processing large graphs quickly and cost-effectively is a major challenge, and existing approaches capable of processing very large graphs often have high computational cost, only solve a limited set of problems, and possess poor theoretical guarantees. Similarly, existing approaches for dynamic or streaming graphs often employ ad-hoc algorithms with unknown theoretical costs and suboptimal performance. This talk will show how shared-memory algorithms can serve as the foundation for a graph processing toolkit for static and evolving graphs that is cost-effective, has strong theoretical guarantees, and achieves state-of-the-art performance.

In the first part of this talk I will describe a shared-memory approach for static graph processing. I will describe a rich interface for parallel graph processing which extends the Ligra interface with provably-efficient primitives. This interface enables simple and provably-efficient shared-memory implementations of over 20 fundamental graph problems. The implementations, which are publicly available as part of the Graph Based Benchmark Suite, solve these problems on the largest publicly available graph, the WebDataCommons hyperlink graph, with over 200 billion edges, in a matter of seconds to minutes using a commodity multicore machine. Compared to existing large-scale graph processing results, our results apply to a much broader set of problems, use orders of magnitude fewer resources, and in many cases run an order of magnitude faster.

In the second part of this talk I will describe a provably-efficient approach for streaming graph processing. I will describe a new compressed purely-functional tree data structure, called a $C$-tree, which admits efficient parallel batch updates and enables a dynamic graph representation based on nested trees. Using this representation, I will describe a serializable graph-streaming system called Aspen that can concurrently apply updates and queries. Compared to existing work, Aspen achieves orders of magnitude higher update rates, while using less memory. Aspen scales to the largest publicly available graphs, and can concurrently update and analyze the WebDataCommons hyperlink graph on a single commodity multicore machine with a terabyte of memory.

Bio

Laxman is currently a Postdoctoral Researcher at MIT CSAIL where he works with Prof. Julian Shun. He recently received his Ph.D. from Carnegie Mellon University, where he was advised by Prof. Guy Blelloch. He works on high-performance parallel algorithms and systems for static, dynamic, and streaming graphs, with a focus on both practical and theoretical efficiency. His work on graph processing has been recognized by a Best Paper Award at SPAA'18 and the Distinguished Paper Award at PLDI'19.

This talk is organized by Richa Mathur