Multi-student Diffusion Distillation for Better One-step Generators
- URL: http://arxiv.org/abs/2410.23274v1
- Date: Wed, 30 Oct 2024 17:54:56 GMT
- Title: Multi-student Diffusion Distillation for Better One-step Generators
- Authors: Yanke Song, Jonathan Lorraine, Weili Nie, Karsten Kreis, James Lucas,
- Abstract summary: Multi-Student Distillation (MSD) is a framework to distill a conditional teacher diffusion model into multiple single-step generators.
MSD trains multiple distilled students, allowing smaller sizes and, therefore, faster inference.
Using 4 same-sized students, MSD sets a new state-of-the-art for one-step image generation: FID 1.20 on ImageNet-64x64 and 8.20 on zero-shot COCO2014.
- Score: 29.751205880199855
- License:
- Abstract: Diffusion models achieve high-quality sample generation at the cost of a lengthy multistep inference procedure. To overcome this, diffusion distillation techniques produce student generators capable of matching or surpassing the teacher in a single step. However, the student model's inference speed is limited by the size of the teacher architecture, preventing real-time generation for computationally heavy applications. In this work, we introduce Multi-Student Distillation (MSD), a framework to distill a conditional teacher diffusion model into multiple single-step generators. Each student generator is responsible for a subset of the conditioning data, thereby obtaining higher generation quality for the same capacity. MSD trains multiple distilled students, allowing smaller sizes and, therefore, faster inference. Also, MSD offers a lightweight quality boost over single-student distillation with the same architecture. We demonstrate MSD is effective by training multiple same-sized or smaller students on single-step distillation using distribution matching and adversarial distillation techniques. With smaller students, MSD gets competitive results with faster inference for single-step generation. Using 4 same-sized students, MSD sets a new state-of-the-art for one-step image generation: FID 1.20 on ImageNet-64x64 and 8.20 on zero-shot COCO2014.
Related papers
- Warmup-Distill: Bridge the Distribution Mismatch between Teacher and Student before Knowledge Distillation [84.38105530043741]
We propose Warmup-Distill, which aligns the distillation of the student to that of the teacher in advance of distillation.
Experiments on the seven benchmarks demonstrate that Warmup-Distill could provide a warmup student more suitable for distillation.
arXiv Detail & Related papers (2025-02-17T12:58:12Z) - DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization [50.30051934609654]
We introduce a distillation method that combines variational score distillation and consistency distillation to achieve few-step video generation.
Our method demonstrates state-of-the-art performance in few-step generation for 10-second videos (128 frames at 12 FPS)
One-step distillation accelerates the teacher model's diffusion sampling by up to 278.6 times, enabling near real-time generation.
arXiv Detail & Related papers (2024-12-20T09:07:36Z) - From Slow Bidirectional to Fast Autoregressive Video Diffusion Models [52.32078428442281]
Current video diffusion models achieve impressive generation quality but struggle in interactive applications due to bidirectional attention dependencies.
We address this limitation by adapting a pretrained bidirectional diffusion transformer to an autoregressive transformer that generates frames on-the-fly.
Our model achieves a total score of 84.27 on the VBench-Long benchmark, surpassing all previous video generation models.
arXiv Detail & Related papers (2024-12-10T18:59:50Z) - AMD: Automatic Multi-step Distillation of Large-scale Vision Models [39.70559487432038]
We present a novel approach named Automatic Multi-step Distillation (AMD) for large-scale vision model compression.
An efficient and effective optimization framework is introduced to automatically identify the optimal teacher-assistant that leads to the maximal student performance.
arXiv Detail & Related papers (2024-07-05T01:35:42Z) - Diffusion Models Are Innate One-Step Generators [2.3359837623080613]
Diffusion Models (DMs) can generate remarkable high-quality results.
DMs' layers are differentially activated at different time steps, leading to an inherent capability to generate images in a single step.
Our method achieves the SOTA results on CIFAR-10, AFHQv2 64x64 (FID 1.23), FFHQ 64x64 (FID 0.85) and ImageNet 64x64 (FID 1.16) with great efficiency.
arXiv Detail & Related papers (2024-05-31T11:14:12Z) - Improved Distribution Matching Distillation for Fast Image Synthesis [54.72356560597428]
We introduce DMD2, a set of techniques that lift this limitation and improve DMD training.
First, we eliminate the regression loss and the need for expensive dataset construction.
Second, we integrate a GAN loss into the distillation procedure, discriminating between generated samples and real images.
arXiv Detail & Related papers (2024-05-23T17:59:49Z) - One-Step Diffusion Distillation via Deep Equilibrium Models [64.11782639697883]
We introduce a simple yet effective means of distilling diffusion models directly from initial noise to the resulting image.
Our method enables fully offline training with just noise/image pairs from the diffusion model.
We demonstrate that the DEQ architecture is crucial to this capability, as GET matches a $5times$ larger ViT in terms of FID scores.
arXiv Detail & Related papers (2023-12-12T07:28:40Z) - Momentum Adversarial Distillation: Handling Large Distribution Shifts in
Data-Free Knowledge Distillation [65.28708064066764]
We propose a simple yet effective method called Momentum Adversarial Distillation (MAD)
MAD maintains an exponential moving average (EMA) copy of the generator and uses synthetic samples from both the generator and the EMA generator to train the student.
Our experiments on six benchmark datasets including big datasets like ImageNet and Places365 demonstrate the superior performance of MAD over competing methods.
arXiv Detail & Related papers (2022-09-21T13:53:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.