log in  |  register  |  feedback?  |  help  |  web accessibility
Attaining Sparsity in Large Language Models: Is it Easy or Hard?
Atlas Wang - University of Texas at Austin
2460 A. V. Williams
Thursday, October 26, 2023, 12:30-1:30 pm
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

In the realm of contemporary deep learning, large pre-trained transformers have seized the spotlight. Understanding the underlying frugal structures within these burgeoning models has become imperative. Although the tools of sparsity, like pruning the lottery ticket hypothesis and sparse training, have enjoyed popularity and success in traditional deep networks, their efficacy in the new era of colossal pre-trained models, such as Large Language Models (LLMs)remains uncertain. This presentation aims to elucidate two seemingly contradictory perspectives on one hand, we explore the notion that compressing LLMS is easier" compared to earlier deep models, but on the other hand, we delve into the aspects that make this endeavor "harder in its own unique way my goal is to convince you that am indeed not contradicting myself.

Bio

Atlas Wang (https://vita-group.github.io/) teaches and researches at UT Austin ECE (primary), CS and Oden CSEM. He usually declares his research interest as machine learning but is never too sure what that means concretely. He has won some awards but is mainly proud of three things: (1) he has done some (hopefully) thought-invoking and practically meaningful work on sparsity, from inverse problems to deep learning his recent favorites include essential sparsity- (https://arxiv.org/abs/2306.03805) heavy-hitter (oracle https://arxiv.org/abs//2306.14048) and sparsity-may-cry (https://openreview.net/forum?id=J6F3Lg4Kdp)(2) he co-founded the Conference on Parsimony and Learning (CPAL), known as the new conference for sparsity" to its community, and serves as its inaugural program chair (https://cpal.cc/); (3) he is fortunate enough to work with a sizable group of world-class students, all smarter than himself. He has so far graduated 14 Ph.D. students and postdocs that are well placed, including four (assistant) professors, and his students have altogether won seven prestigious PhD fellowships (NSF GRFP, IBM, Apple, Adobe, Amazon, Qualcomm, and Snap), among many other honors.

This talk is organized by Samuel Malede Zewdu