Over the past decade, there has been a surge in the use of bulky deep networks demanding significant memory and computation resources. As networks continue to scale, they rely on more data which is being created and transmitted at an exponential rate. It is infeasible to store data and networks for multiple applications on a single resource-constrained edge device with limited memory. Many networks are computationally expensive as well and are slow to execute. My talk will introduce a unified framework to reduce network memory and computation costs simultaneously. I will then show its generalizability to achieve data compression as well. In the first part, I will introduce a framework for compressing convolutional neural networks (CNNs). We use quantized latent representations to improve storage efficiency on disk. We simultaneously achieve sparsity in the network to improve computational efficiency. The second part of my talk will focus on the application of the framework for data compression as well via implicit neural representations (INRs). We develop a method to compress videos in an auto regressive manner exploiting spatiotemporal redundancies. Doing so improves encoding time and decoding speed efficiency when compared to prior INR works. Next I shall discuss about applying my compression framework to other data modalities like images, videos, NeRFs by applying it on feature-grid based INR models. As part of my ongoing research, I shall talk about compressing 3D Gaussian splats for fast training and rendering of 3D scenes. I will also briefly discuss a post training network compression framework, proposing to utilize block sparsity for improving efficiency of matrix multiplication in large scale transformers.
Sharath Girish is a PhD student at the University of Maryland, College Park, advised by Prof. Abhinav Shrivastava. His research mainly focuses on accelerating and compressing deep networks. He is also interested in learning efficient and compact neural representations for data.