log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
PhD Preliminary: High Performance Distributed Transactions for Multi-region Database Systems
Cuong Nguyen
Thursday, January 4, 2024, 2:30-4:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract
The increased popularity of global applications has spurred the demand for a more advanced database system layer in multi-region cloud infrastructures. This layer should seamlessly abstract the distributed nature of the system behind a strong data consistency interface, enabling developers to focus on application logic without delving into complex data race issues. Simultaneously, it needs to exhibit resilience in the face of calamities, ranging from hardware failures to natural disasters that shut down entire data centers. However, these goals are usually at odds with the need for high performance. Guaranteeing a strong data consistency model necessitates additional coordination rounds between nodes to ensure the proper ordering of read and write operations. Enhancing availability and durability necessitates data replication across multiple locations. These measures, coupled with the potential for high-latency cross-region communication, can significantly degrade performance. Consequently, existing solutions typically either compromise on weaker consistency levels or incur performance penalties for achieving better guarantees.

In this dissertation, we propose a new deterministic concurrency control algorithm to address the performance limitation caused by cross-region network communication. While traditional database systems have to guard against non-deterministic failures by using expensive protocol like two-phase commit, exacerbated in a geographically distributed setting, deterministic concurrency control algorithms employ an upfront total ordering at a lower cost and eschew such a heavyweight protocol. Taking one step further, we devise a new deadlock resolution algorithm that reduces the number of cross-region network round-trips to an absolute minimum, significantly enhancing the efficiency and responsiveness of the system.

The design of deterministic database systems often rests on certain assumptions, such as the read and write set of each transaction should be known in advance. Recent database systems, including non-deterministic ones, also leverage these assumptions for performance gains. Nevertheless, the validity of these assumptions has primarily been supported by anecdotal evidence or intuition. To bridge this gap and systematically assess the prevalence of these assumptions, we conducted a large-scale empirical study across a diverse corpus of open-source, real-world applications. This study not only provides quantitative evidence for the cases where these assumptions hold true, but also guides future efforts to address the cases where they do not.

Finally, recognizing that deterministic concurrency control is still in its nascent stages for widespread adoption, we explore a general architectural design capable of equipping existing non-deterministic databases with strongly consistent multi-region transactions. We show that with minimal changes to a shared-storage database system, we can enable fast multi-region transactions while inheriting decades of expertly built and rigorously tested features from the original database system.
Bio

Cuong Nguyen is a PhD student in Computer Science at the University of Maryland, College Park, where he is advised by Prof. Daniel Abadi. His research interests lie in the intersection of database systems and distributed systems. He is currently exploring techniques to achieve high performance, scalability, and consistency in distributed transactions.

This talk is organized by Migo Gui