On Distillation of Guided Diffusion Models
- URL: http://arxiv.org/abs/2210.03142v3
- Date: Wed, 12 Apr 2023 21:23:35 GMT
- Title: On Distillation of Guided Diffusion Models
- Authors: Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik P. Kingma, Stefano
Ermon, Jonathan Ho, Tim Salimans
- Abstract summary: We propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from.
For standard diffusion models trained on the pixelspace, our approach is able to generate images visually comparable to that of the original model.
For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps.
- Score: 94.95228078141626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classifier-free guided diffusion models have recently been shown to be highly
effective at high-resolution image generation, and they have been widely used
in large-scale diffusion frameworks including DALLE-2, Stable Diffusion and
Imagen. However, a downside of classifier-free guided diffusion models is that
they are computationally expensive at inference time since they require
evaluating two diffusion models, a class-conditional model and an unconditional
model, tens to hundreds of times. To deal with this limitation, we propose an
approach to distilling classifier-free guided diffusion models into models that
are fast to sample from: Given a pre-trained classifier-free guided model, we
first learn a single model to match the output of the combined conditional and
unconditional models, and then we progressively distill that model to a
diffusion model that requires much fewer sampling steps. For standard diffusion
models trained on the pixel-space, our approach is able to generate images
visually comparable to that of the original model using as few as 4 sampling
steps on ImageNet 64x64 and CIFAR-10, achieving FID/IS scores comparable to
that of the original model while being up to 256 times faster to sample from.
For diffusion models trained on the latent-space (e.g., Stable Diffusion), our
approach is able to generate high-fidelity images using as few as 1 to 4
denoising steps, accelerating inference by at least 10-fold compared to
existing methods on ImageNet 256x256 and LAION datasets. We further demonstrate
the effectiveness of our approach on text-guided image editing and inpainting,
where our distilled model is able to generate high-quality results using as few
as 2-4 denoising steps.
Related papers
- One-Step Diffusion Distillation through Score Implicit Matching [74.91234358410281]
We present Score Implicit Matching (SIM) a new approach to distilling pre-trained diffusion models into single-step generator models.
SIM shows strong empirical performances for one-step generators.
By applying SIM to a leading transformer-based diffusion model, we distill a single-step generator for text-to-image generation.
arXiv Detail & Related papers (2024-10-22T08:17:20Z) - Multistep Distillation of Diffusion Models via Moment Matching [29.235113968156433]
We present a new method for making diffusion models faster to sample.
The method distills many-step diffusion models into few-step models by matching conditional expectations of the clean data.
We obtain new state-of-the-art results on the Imagenet dataset.
arXiv Detail & Related papers (2024-06-06T14:20:21Z) - Plug-and-Play Diffusion Distillation [14.359953671470242]
We propose a new distillation approach for guided diffusion models.
An external lightweight guide model is trained while the original text-to-image model remains frozen.
We show that our method reduces the inference of classifier-free guided latent-space diffusion models by almost half.
arXiv Detail & Related papers (2024-06-04T04:22:47Z) - Directly Denoising Diffusion Models [6.109141407163027]
We present Directly Denoising Diffusion Model (DDDM), a simple and generic approach for generating realistic images with few-step sampling.
Our model achieves FID scores of 2.57 and 2.33 on CIFAR-10 in one-step and two-step sampling respectively, surpassing those obtained from GANs and distillation-based models.
For ImageNet 64x64, our approach stands as a competitive contender against leading models.
arXiv Detail & Related papers (2024-05-22T11:20:32Z) - Distilling Diffusion Models into Conditional GANs [90.76040478677609]
We distill a complex multistep diffusion model into a single-step conditional GAN student model.
For efficient regression loss, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space.
We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models.
arXiv Detail & Related papers (2024-05-09T17:59:40Z) - Adversarial Diffusion Distillation [18.87099764514747]
Adversarial Diffusion Distillation (ADD) is a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1-4 steps.
We use score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal.
Our model clearly outperforms existing few-step methods in a single step and reaches the performance of state-of-the-art diffusion models (SDXL) in only four steps.
arXiv Detail & Related papers (2023-11-28T18:53:24Z) - Consistency Models [89.68380014789861]
We propose a new family of models that generate high quality samples by directly mapping noise to data.
They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality.
They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training.
arXiv Detail & Related papers (2023-03-02T18:30:16Z) - SinDiffusion: Learning a Diffusion Model from a Single Natural Image [159.4285444680301]
We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.
It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales.
Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics.
arXiv Detail & Related papers (2022-11-22T18:00:03Z) - Cascaded Diffusion Models for High Fidelity Image Generation [53.57766722279425]
We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation challenge.
A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution.
We find that the sample quality of a cascading pipeline relies crucially on conditioning augmentation.
arXiv Detail & Related papers (2021-05-30T17:14:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.