log in  |  register  |  feedback?  |  help  |  web accessibility
PhD Proposal: Rearchitecting cloud telemetry through approximation-first techniques
Zeying Zhu
IRB-5161 https://umd.zoom.us/j/95503569537?pwd=7cqT8mNQ2OFDh3YqEBoCGbxnROH8ba.1
Tuesday, November 11, 2025, 1:30-3:00 pm
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Modern cloud-native infrastructures and applications generate enormous volumes of telemetry data that are critical for performance monitoring, fault diagnosis, and resource optimization. However, as these systems scale to millions of components, such as containers and microservices, the cost of collecting, storing, and analyzing high-volume telemetry data becomes prohibitively high, posing significant challenges to operational costs, scalability, and real-time responsiveness of cloud telemetry systems.

This proposal explores an approximation-first approach to rethinking telemetry architecture, where relaxing exact accuracy enables significant gains in efficiency and responsiveness without compromising practical use. Building on this principle, my proposed research focuses on three aspects. First, I will describe PromSketch, an approximate intermediate query caching system for reducing time series query latency and cloud billing costs. Second, I will discuss how to achieve an elastic and scalable key-value storage system by leveraging approximate membership indexing within network switches with NetMigrate. Finally, I will propose a new cloud telemetry architecture, focusing on reducing telemetry data ingestion costs, leveraging on-device approximation.

Bio

Zeying Zhu is a PhD student at University of Maryland, College Park, advised by Prof. Alan (Zaoxing) Liu. Her research interests are broadly in systems and networking, with a focus on bridging the gap between approximate algorithms (e.g., sampling, sketches) and practical computing systems, aiming to enable low latency, high scalability, low operational costs, and trustworthiness of cloud telemetry systems.

This talk is organized by Migo Gui