Deep generative models grow larger every year, yet each increase in scale makes it clearer that not all parameters are needed for every input. Activating a full multi-billion‑parameter model to generate a simple image is wasteful; doing so millions of times per day is unsustainable. This proposal unifies three lines of work that move away from one-size-fits-all architectures toward input-adaptive, expert-based systems. It addresses inefficiencies from structural, algorithmic, and safety standpoints, while preserving generation quality.
The first study makes the case that efficiency should begin at the prompt level. It introduces a routing mechanism that maps prompts to specialized sub-networks, converting a single text-to-image diffusion model into a mixture-of-experts over prompts. Each expert retains only the layers and channels relevant to its prompt class, achieving resource-constrained efficiency without manual design or neural architecture search.
The second study extends input-conditioned sparsity to autoregressive image generation models. It reuses gating logic learned during dynamic sparsification as a top-1 router, converting a dense GPT-style decoder into a sparse Mixture-of-Experts over tokens. This approach exposes and reuses expert subnetworks already embedded in the pretrained model, maintains near-dense generation quality under a fixed token-level compute budget, and removes the need for recovery fine-tuning required for compressed networks.
The third study focuses on safety in compressed models. It shows that fine-tuning, especially via distillation, can reintroduce unwanted behaviors, such as reproducing copyrighted or unsafe content, even after structural sparsification. To address this, it frames fine-tuning and unlearning as a single bilevel optimization problem, where a lower-level objective restores generative quality and an upper-level objective suppresses undesired concepts. This integrated approach outperforms traditional two-stage pipelines.
Reza Shirkavand a 3rd year PhD student in the Department of Computer Science at the University of Maryland, College Park. He is advised by Dr. Heng Huang. His research focuses on designing efficient and adaptive generative models.