Talks

PhD Preliminary: Enhancing Modern Query Federation Systems: Execution Optimization, Performance Prediction, and Systems Assessment

Chujun Song

IRB-5165 or via Zoom https://umd.zoom.us/my/cjsong

Thursday, January 18, 2024, 10:00 am-12:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Query federation engines typically need to execute queries against data stored in systems with query processing capabilities, such as DBMSs. Given the same query, we could design a query federation engine to retrieve data directly from the DBMS and then execute the query within the query federation engine (a design we refer to as 'push data to code'). Alternatively, we could offload the majority of the query processing work to the DBMS, only fetching the results back (a design we refer to as 'push code to data'). However, these approaches represent extreme designs where one side handles most of the work while the other remains largely idle: the DBMS is idle in the 'push data to code' approach, and the query federation engine is idle in the 'push code to data' approach. Our work explores a middle ground, engaging both the query federation engine and the DBMS in query processing to maximize resource utilization. We focus specifically on join processing, as joins are typically the most resource-intensive part of query processing. To achieve this, we designed novel join algorithms that involve both sides in the query processing. These algorithms were implemented on the query federation engine Trino and evaluated against the 'push data to code' and 'push code to data' scenarios. We found that our new design indeed achieves lower latency. Additionally, we developed dynamic algorithms that can decide in runtime whether to execute the next part of the process on the query federation end or the DBMS end, adapting to changes in CPU pressure on both sides. Experiments showed that this dynamic approach outperforms static algorithms in terms of query latency.

Bio

Chujun Song is a PhD student in Computer Science at the University of Maryland, College Park, advised by Prof. Daniel Abadi. He is interested in database systems, especially in query engines. He has interned on the AWS Athena team before working on query optimization and planner refinement and LinkedIn Systems Lab working on query performance prediction for smart query routing systems.

This talk is organized by Migo Gui