Talks

PhD Defense: Parallel Algorithms for Processing Massive Texts and Graphs

Hamed Saleh

5105 Brendan Iribe Center for Computer Science and Engineering (IRB)

Monday, November 7, 2022, 3:00-5:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Designing parallel algorithms with sublinear space is an inevitable scenario when the input data does not fit in the memory of available systems. In recent years, with the abundance of data and the increasing demand for large-scale data processing, the size of datasets easily surpasses that of the typical machine’s memory. Examples of the problem domain vary from social network graphs to DNA sequences. However, to address any problem subject to this restriction efficiently, we need alternative models of computation as the traditional RAM model is no longer an option. The main focus of our research is studying classical algorithms on texts and graphs in the Massively Parallel Computation model.

During the last decade, the Massively Parallel Computation (MPC) model attracted a considerable amount of attention. The MPC model was originally proposed to provide a theoretical foundation for algorithms implemented on modern large-scale data processing frameworks such as MapReduce, Hadoop, and Spark. The key idea behind this model, and also the aforementioned frameworks, is to use many machines to compensate for the shortage of space in individual machines. In this model, the data is distributed among a set of machines each with a sublinear memory, and the process consists of several rounds. In each round, machines perform an arbitrary amount of computation on their local data independently. At the end of each round, machines can communicate with each other. We propose constant-round MPC algorithms for Edge Coloring, Pattern Matching, constructing Suffix Trees, Longest Common Substring, and Longest Palindrome Substring.

Recently, the Adaptive Massively Parallel Computation (AMPC) model was introduced as an extension of MPC. Despite the significant progress in designing highly efficient MPC algorithms, the community is facing an impenetrable barrier due to a hardness conjecture that Graph Connectivity requires at least a logarithmic number of rounds. Inspired by this limitation, the AMPC model adds distributed hash-tables to the setup where each machine can adaptively read from them during each round. This tweak not only leads to a constant-round Graph Connectivity algorithm and a plethora of logarithmic improvements in many graph problems, but it is also practical thanks to the recent developments in hardware infrastructure and technologies such as RDMA and eRPC. We explore the limitations of this model by introducing a general Tree Contraction framework that translates well-studied applications in the context of PRAM into constant-round AMPC algorithms and solving the Minimum Cut problem in sublogarithmic rounds.

Examining Committee

Chair:	Dr. Mohammad Hajiaghayi
Dean's Representative:	Dr. Alexander Barg
Members:	Dr. David Mount
	Dr. Soheil Feizi
	Dr. Ming Lin
	Dr. Barna Saha

Bio

Hamed Saleh is a PhD student in the CS Department at the University of Maryland, working under the supervision of Prof. Hajiaghayi. His research interests include designing algorithms for solving combinatorial problems -- such as graph problems, problems on strings, and classic DPs -- in sublinear computational models such as Massively Parallel Computation (MPC), Streaming, Dynamic Algorithms, and Sublinear-space Algorithms.

This talk is organized by Tom Hurst