Modern cloud-native infrastructures and applications generate enormous volumes of telemetry data that are critical for performance monitoring, fault diagnosis, and resource optimization. However, as these systems scale to millions of components, such as containers and microservices, the cost of collecting, storing, and analyzing high-volume telemetry data becomes prohibitively high, posing significant challenges to operational costs, scalability, and real-time responsiveness of cloud telemetry systems.
This proposal explores an approximation-first approach to rethinking telemetry architecture, where relaxing exact accuracy enables significant gains in efficiency and responsiveness without compromising practical use. Building on this principle, my proposed research focuses on three aspects. First, I will describe PromSketch, an approximate intermediate query caching system for reducing time series query latency and cloud billing costs. Second, I will discuss how to achieve an elastic and scalable key-value storage system by leveraging approximate membership indexing within network switches with NetMigrate. Finally, I will propose a new cloud telemetry architecture, focusing on reducing telemetry data ingestion costs, leveraging on-device approximation.
Zeying Zhu is a PhD student at University of Maryland, College Park, advised by Prof. Alan (Zaoxing) Liu. Her research interests are broadly in systems and networking, with a focus on bridging the gap between approximate algorithms (e.g., sampling, sketches) and practical computing systems, aiming to enable low latency, high scalability, low operational costs, and trustworthiness of cloud telemetry systems.

