Related papers: Multi-student Diffusion Distillation for Better One-step Generators

Multi-student Diffusion Distillation for Better One-step Generators

URL: http://arxiv.org/abs/2410.23274v1
Date: Wed, 30 Oct 2024 17:54:56 GMT
Title: Multi-student Diffusion Distillation for Better One-step Generators
Authors: Yanke Song, Jonathan Lorraine, Weili Nie, Karsten Kreis, James Lucas,
Abstract summary: Multi-Student Distillation (MSD) is a framework to distill a conditional teacher diffusion model into multiple single-step generators. MSD trains multiple distilled students, allowing smaller sizes and, therefore, faster inference. Using 4 same-sized students, MSD sets a new state-of-the-art for one-step image generation: FID 1.20 on ImageNet-64x64 and 8.20 on zero-shot COCO2014.
Score: 29.751205880199855
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models achieve high-quality sample generation at the cost of a lengthy multistep inference procedure. To overcome this, diffusion distillation techniques produce student generators capable of matching or surpassing the teacher in a single step. However, the student model's inference speed is limited by the size of the teacher architecture, preventing real-time generation for computationally heavy applications. In this work, we introduce Multi-Student Distillation (MSD), a framework to distill a conditional teacher diffusion model into multiple single-step generators. Each student generator is responsible for a subset of the conditioning data, thereby obtaining higher generation quality for the same capacity. MSD trains multiple distilled students, allowing smaller sizes and, therefore, faster inference. Also, MSD offers a lightweight quality boost over single-student distillation with the same architecture. We demonstrate MSD is effective by training multiple same-sized or smaller students on single-step distillation using distribution matching and adversarial distillation techniques. With smaller students, MSD gets competitive results with faster inference for single-step generation. Using 4 same-sized students, MSD sets a new state-of-the-art for one-step image generation: FID 1.20 on ImageNet-64x64 and 8.20 on zero-shot COCO2014.

Related papers

Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z)
Warmup-Distill: Bridge the Distribution Mismatch between Teacher and Student before Knowledge Distillation [84.38105530043741]
We propose Warmup-Distill, which aligns the distillation of the student to that of the teacher in advance of distillation. Experiments on the seven benchmarks demonstrate that Warmup-Distill could provide a warmup student more suitable for distillation.
arXiv Detail & Related papers (2025-02-17T12:58:12Z)
DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization [50.30051934609654]
We introduce a distillation method that combines variational score distillation and consistency distillation to achieve few-step video generation. Our method demonstrates state-of-the-art performance in few-step generation for 10-second videos (128 frames at 12 FPS) One-step distillation accelerates the teacher model's diffusion sampling by up to 278.6 times, enabling near real-time generation.
arXiv Detail & Related papers (2024-12-20T09:07:36Z)
Unleashing the Power of One-Step Diffusion based Image Super-Resolution via a Large-Scale Diffusion Discriminator [81.81748032199813]
Diffusion models have demonstrated excellent performance for real-world image super-resolution (Real-ISR) We propose a new One-Step textbfDiffusion model with a larger-scale textbfDiscriminator for SR. Our discriminator is able to distill noisy features from any time step of diffusion models in the latent space.
arXiv Detail & Related papers (2024-10-05T16:41:36Z)
Accelerating Diffusion Models with One-to-Many Knowledge Distillation [35.130782477699704]
We introduce one-to-many knowledge distillation (O2MKD), which distills a single teacher diffusion model into multiple student diffusion models. Experiments on CIFAR10, LSUN Church, CelebA-HQ with DDPM and COCO30K with Stable Diffusion show that O2MKD can be applied to previous knowledge distillation and fast sampling methods to achieve significant acceleration.
arXiv Detail & Related papers (2024-10-05T15:10:04Z)
AMD: Automatic Multi-step Distillation of Large-scale Vision Models [39.70559487432038]
We present a novel approach named Automatic Multi-step Distillation (AMD) for large-scale vision model compression. An efficient and effective optimization framework is introduced to automatically identify the optimal teacher-assistant that leads to the maximal student performance.
arXiv Detail & Related papers (2024-07-05T01:35:42Z)
Diffusion Models Are Innate One-Step Generators [2.3359837623080613]
Diffusion Models (DMs) can generate remarkable high-quality results. DMs' layers are differentially activated at different time steps, leading to an inherent capability to generate images in a single step. Our method achieves the SOTA results on CIFAR-10, AFHQv2 64x64 (FID 1.23), FFHQ 64x64 (FID 0.85) and ImageNet 64x64 (FID 1.16) with great efficiency.
arXiv Detail & Related papers (2024-05-31T11:14:12Z)
Improved Distribution Matching Distillation for Fast Image Synthesis [54.72356560597428]
We introduce DMD2, a set of techniques that lift this limitation and improve DMD training. First, we eliminate the regression loss and the need for expensive dataset construction. Second, we integrate a GAN loss into the distillation procedure, discriminating between generated samples and real images.
arXiv Detail & Related papers (2024-05-23T17:59:49Z)
Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation [61.03530321578825]
We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fr'echet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models.
arXiv Detail & Related papers (2024-04-05T12:30:19Z)
One-Step Diffusion Distillation via Deep Equilibrium Models [64.11782639697883]
We introduce a simple yet effective means of distilling diffusion models directly from initial noise to the resulting image. Our method enables fully offline training with just noise/image pairs from the diffusion model. We demonstrate that the DEQ architecture is crucial to this capability, as GET matches a $5times$ larger ViT in terms of FID scores.
arXiv Detail & Related papers (2023-12-12T07:28:40Z)
One-step Diffusion with Distribution Matching Distillation [54.723565605974294]
We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator. We enforce the one-step image generator match the diffusion model at distribution level, by minimizing an approximate KL divergence. Our method outperforms all published few-step diffusion approaches, reaching 2.62 FID on ImageNet 64x64 and 11.49 FID on zero-shot COCO-30k.
arXiv Detail & Related papers (2023-11-30T18:59:20Z)
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few. We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z)
Momentum Adversarial Distillation: Handling Large Distribution Shifts in Data-Free Knowledge Distillation [65.28708064066764]
We propose a simple yet effective method called Momentum Adversarial Distillation (MAD) MAD maintains an exponential moving average (EMA) copy of the generator and uses synthetic samples from both the generator and the EMA generator to train the student. Our experiments on six benchmark datasets including big datasets like ImageNet and Places365 demonstrate the superior performance of MAD over competing methods.
arXiv Detail & Related papers (2022-09-21T13:53:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.