Distilling Diversity and Control in Diffusion Models
- URL: http://arxiv.org/abs/2503.10637v2
- Date: Fri, 14 Mar 2025 13:11:59 GMT
- Title: Distilling Diversity and Control in Diffusion Models
- Authors: Rohit Gandikota, David Bau,
- Abstract summary: Distilled diffusion models suffer from a critical limitation: reduced sample diversity compared to their base counterparts.<n>We show that despite this diversity loss, distilled models retain the fundamental concept representations of base models.<n>We introduce diversity distillation - a hybrid inference approach that strategically employs the base model for only the first critical timestep before transitioning to the efficient distilled model.
- Score: 27.352868008401614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distilled diffusion models suffer from a critical limitation: reduced sample diversity compared to their base counterparts. In this work, we uncover that despite this diversity loss, distilled models retain the fundamental concept representations of base models. We demonstrate control distillation - where control mechanisms like Concept Sliders and LoRAs trained on base models can be seamlessly transferred to distilled models and vice-versa, effectively distilling control without any retraining. This preservation of representational structure prompted our investigation into the mechanisms of diversity collapse during distillation. To understand how distillation affects diversity, we introduce Diffusion Target (DT) Visualization, an analysis and debugging tool that reveals how models predict final outputs at intermediate steps. Through DT-Visualization, we identify generation artifacts, inconsistencies, and demonstrate that initial diffusion timesteps disproportionately determine output diversity, while later steps primarily refine details. Based on these insights, we introduce diversity distillation - a hybrid inference approach that strategically employs the base model for only the first critical timestep before transitioning to the efficient distilled model. Our experiments demonstrate that this simple modification not only restores the diversity capabilities from base to distilled models but surprisingly exceeds it, while maintaining nearly the computational efficiency of distilled inference, all without requiring additional training or model modifications. Our code and data are available at https://distillation.baulab.info
Related papers
- Antidistillation Sampling [98.87756003405627]
Models that generate extended reasoning traces inadvertently produce rich token sequences that can facilitate model distillation.
Recognizing this vulnerability, model owners may seek sampling strategies that limit the effectiveness of distillation without compromising model performance.
Antidistillation sampling renders reasoning traces significantly less effective for distillation while preserving the model's practical utility.
arXiv Detail & Related papers (2025-04-17T17:54:14Z) - Denoising Score Distillation: From Noisy Diffusion Pretraining to One-Step High-Quality Generation [82.39763984380625]
We introduce denoising score distillation (DSD), a surprisingly effective and novel approach for training high-quality generative models from low-quality data.<n>DSD pretrains a diffusion model exclusively on noisy, corrupted samples and then distills it into a one-step generator capable of producing refined, clean outputs.
arXiv Detail & Related papers (2025-03-10T17:44:46Z) - Inference-Time Diffusion Model Distillation [59.350789627086456]
We introduce Distillation++, a novel inference-time distillation framework.<n>Inspired by recent advances in conditional sampling, our approach recasts student model sampling as a proximal optimization problem.<n>We integrate distillation optimization during reverse sampling, which can be viewed as teacher guidance.
arXiv Detail & Related papers (2024-12-12T02:07:17Z) - DDIL: Improved Diffusion Distillation With Imitation Learning [57.3467234269487]
Diffusion models excel at generative modeling (e.g., text-to-image) but sampling requires multiple denoising network passes.
Progressive distillation or consistency distillation have shown promise by reducing the number of passes.
We show that DDIL consistency improves on baseline algorithms of progressive distillation (PD), Latent consistency models (LCM) and Distribution Matching Distillation (DMD2)
arXiv Detail & Related papers (2024-10-15T18:21:47Z) - Accelerating Diffusion Models with One-to-Many Knowledge Distillation [35.130782477699704]
We introduce one-to-many knowledge distillation (O2MKD), which distills a single teacher diffusion model into multiple student diffusion models.
Experiments on CIFAR10, LSUN Church, CelebA-HQ with DDPM and COCO30K with Stable Diffusion show that O2MKD can be applied to previous knowledge distillation and fast sampling methods to achieve significant acceleration.
arXiv Detail & Related papers (2024-10-05T15:10:04Z) - Variational Distillation of Diffusion Policies into Mixture of Experts [26.315682445979302]
This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE)
Diffusion Models are the current state-of-the-art in generative modeling due to their exceptional ability to accurately learn and represent complex, multi-modal distributions.
VDD is the first method that distills pre-trained diffusion models into MoE models, and hence, combines the expressiveness of Diffusion Models with the benefits of Mixture Models.
arXiv Detail & Related papers (2024-06-18T12:15:05Z) - EM Distillation for One-step Diffusion Models [65.57766773137068]
We propose a maximum likelihood-based approach that distills a diffusion model to a one-step generator model with minimal loss of quality.
We develop a reparametrized sampling scheme and a noise cancellation technique that together stabilizes the distillation process.
arXiv Detail & Related papers (2024-05-27T05:55:22Z) - Distilling Diffusion Models into Conditional GANs [90.76040478677609]
We distill a complex multistep diffusion model into a single-step conditional GAN student model.
For efficient regression loss, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space.
We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models.
arXiv Detail & Related papers (2024-05-09T17:59:40Z) - Diversity Matters When Learning From Ensembles [20.05842308307947]
Deep ensembles excel in large-scale image classification tasks both in terms of prediction accuracy and calibration.
Despite being simple to train, the computation and memory cost of deep ensembles limits their practicability.
We propose a simple approach for reducing this gap, i.e., making the distilled performance close to the full ensemble.
arXiv Detail & Related papers (2021-10-27T03:44:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.