One-Step Diffusion Distillation via Deep Equilibrium Models
- URL: http://arxiv.org/abs/2401.08639v1
- Date: Tue, 12 Dec 2023 07:28:40 GMT
- Title: One-Step Diffusion Distillation via Deep Equilibrium Models
- Authors: Zhengyang Geng and Ashwini Pokle and J. Zico Kolter
- Abstract summary: We introduce a simple yet effective means of distilling diffusion models directly from initial noise to the resulting image.
Our method enables fully offline training with just noise/image pairs from the diffusion model.
We demonstrate that the DEQ architecture is crucial to this capability, as GET matches a $5times$ larger ViT in terms of FID scores.
- Score: 64.11782639697883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models excel at producing high-quality samples but naively require
hundreds of iterations, prompting multiple attempts to distill the generation
process into a faster network. However, many existing approaches suffer from a
variety of challenges: the process for distillation training can be complex,
often requiring multiple training stages, and the resulting models perform
poorly when utilized in single-step generative applications. In this paper, we
introduce a simple yet effective means of distilling diffusion models directly
from initial noise to the resulting image. Of particular importance to our
approach is to leverage a new Deep Equilibrium (DEQ) model as the distilled
architecture: the Generative Equilibrium Transformer (GET). Our method enables
fully offline training with just noise/image pairs from the diffusion model
while achieving superior performance compared to existing one-step methods on
comparable training budgets. We demonstrate that the DEQ architecture is
crucial to this capability, as GET matches a $5\times$ larger ViT in terms of
FID scores while striking a critical balance of computational cost and image
quality. Code, checkpoints, and datasets are available.
Related papers
- One Step Diffusion via Shortcut Models [109.72495454280627]
We introduce shortcut models, a family of generative models that use a single network and training phase to produce high-quality samples.
Shortcut models condition the network on the current noise level and also on the desired step size, allowing the model to skip ahead in the generation process.
Compared to distillation, shortcut models reduce complexity to a single network and training phase and additionally allow varying step budgets at inference time.
arXiv Detail & Related papers (2024-10-16T13:34:40Z) - Distillation-Free One-Step Diffusion for Real-World Image Super-Resolution [81.81748032199813]
We propose a Distillation-Free One-Step Diffusion model.
Specifically, we propose a noise-aware discriminator (NAD) to participate in adversarial training.
We improve the perceptual loss with edge-aware DISTS (EA-DISTS) to enhance the model's ability to generate fine details.
arXiv Detail & Related papers (2024-10-05T16:41:36Z) - Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization [97.35427957922714]
We present an algorithm named pairwise sample optimization (PSO), which enables the direct fine-tuning of an arbitrary timestep-distilled diffusion model.
PSO introduces additional reference images sampled from the current time-step distilled model, and increases the relative likelihood margin between the training images and reference images.
We show that PSO can directly adapt distilled models to human-preferred generation with both offline and online-generated pairwise preference image data.
arXiv Detail & Related papers (2024-10-04T07:05:16Z) - EM Distillation for One-step Diffusion Models [65.57766773137068]
We propose a maximum likelihood-based approach that distills a diffusion model to a one-step generator model with minimal loss of quality.
We develop a reparametrized sampling scheme and a noise cancellation technique that together stabilizes the distillation process.
arXiv Detail & Related papers (2024-05-27T05:55:22Z) - Adversarial Diffusion Distillation [18.87099764514747]
Adversarial Diffusion Distillation (ADD) is a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1-4 steps.
We use score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal.
Our model clearly outperforms existing few-step methods in a single step and reaches the performance of state-of-the-art diffusion models (SDXL) in only four steps.
arXiv Detail & Related papers (2023-11-28T18:53:24Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.