Rethinking AI Model Storage with Model-aware and Tensor-centric Compression
Yue Cheng
IRB 4105 or https://umd.zoom.us/j/93666933047?pwd=gWgqOgGbBP6laZclyURdDG2mNdArBt.1
Abstract
Large-scale model hubs such as Hugging Face and ModelScope have become core infrastructure for modern AI. They host millions of pretrained and fine-tuned models, especially large language models, and support a broad ecosystem of downstream use across both industry and academia. But this infrastructure is coming under growing pressure: by late 2025, Hugging Face alone hosted more than 77 PB of model artifacts, and its storage footprint continues to grow exponentially, posing mounting cost and sustainability challenges.
In this talk, I will present a new perspective on AI model hubs: rather than storing model artifacts simply as collections of independent blob files, I will show how to uncover and exploit hidden structure across models at scale. At one level, models are related through fine-tuning and evolution. At another, models are composed of tensors, and tensors across different models have subtle connections. Together, these hidden patterns create new opportunities for rethinking how model hubs are stored and managed. I will first introduce ZipLLM, which redesigns storage reduction around model lineage. I will then show why model-level lineage alone is not enough: substantial reduction opportunities remain hidden at tensor granularity. To address this, I will present TensorDex, a tensor-centric model compression system that achieves significant lossless storage reduction for large-scale model hubs. Finally, I will argue that these results open up a new AI+data systems direction: tensor-centric AI infrastructure. I will conclude by outlining this vision and discussing future research directions.
Bio
Yue Cheng is an associate professor at the University of Virginia. His research interests include systems for AI and AI systems, serverless computing, and storage systems. His group has built a number of techniques to improve the efficiency, sustainability, and accessibility of cloud and AI platforms. Some of his works have led to large-scale deployments and adoptions in public clouds and power the AI applications used by millions every day. He is a recipient of several awards and honors, including an Amazon Research Award, an NSF CAREER Award, a Meta Research Award, the 2022 IEEE CS TCHPC Early Career Researchers Award for Excellence in HPC, and a Samsung GRO Award.
This talk is organized by Samuel Malede Zewdu

