One Step Diffusion via Shortcut Models
- URL: http://arxiv.org/abs/2410.12557v1
- Date: Wed, 16 Oct 2024 13:34:40 GMT
- Title: One Step Diffusion via Shortcut Models
- Authors: Kevin Frans, Danijar Hafner, Sergey Levine, Pieter Abbeel,
- Abstract summary: We introduce shortcut models, a family of generative models that use a single network and training phase to produce high-quality samples.
Shortcut models condition the network on the current noise level and also on the desired step size, allowing the model to skip ahead in the generation process.
Compared to distillation, shortcut models reduce complexity to a single network and training phase and additionally allow varying step budgets at inference time.
- Score: 109.72495454280627
- License:
- Abstract: Diffusion models and flow-matching models have enabled generating diverse and realistic images by learning to transfer noise to data. However, sampling from these models involves iterative denoising over many neural network passes, making generation slow and expensive. Previous approaches for speeding up sampling require complex training regimes, such as multiple training phases, multiple networks, or fragile scheduling. We introduce shortcut models, a family of generative models that use a single network and training phase to produce high-quality samples in a single or multiple sampling steps. Shortcut models condition the network not only on the current noise level but also on the desired step size, allowing the model to skip ahead in the generation process. Across a wide range of sampling step budgets, shortcut models consistently produce higher quality samples than previous approaches, such as consistency models and reflow. Compared to distillation, shortcut models reduce complexity to a single network and training phase and additionally allow varying step budgets at inference time.
Related papers
- Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints.
We empirically find that this training paradigm limits the one-step generation performance of consistency models.
We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z) - A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks.
Our approach enables versatile capabilities via different inference-time sampling schemes.
Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z) - Provable Statistical Rates for Consistency Diffusion Models [87.28777947976573]
Despite the state-of-the-art performance, diffusion models are known for their slow sample generation due to the extensive number of steps involved.
This paper contributes towards the first statistical theory for consistency models, formulating their training as a distribution discrepancy minimization problem.
arXiv Detail & Related papers (2024-06-23T20:34:18Z) - Multistep Distillation of Diffusion Models via Moment Matching [29.235113968156433]
We present a new method for making diffusion models faster to sample.
The method distills many-step diffusion models into few-step models by matching conditional expectations of the clean data.
We obtain new state-of-the-art results on the Imagenet dataset.
arXiv Detail & Related papers (2024-06-06T14:20:21Z) - Multistep Consistency Models [24.443707181138553]
A 1-step consistency model is a conventional consistency model whereas a $infty$-step consistency model is a diffusion model.
By increasing the sample budget from a single step to 2-8 steps, we can train models more easily that generate higher quality samples.
We show that our method scales to a text-to-image diffusion model, generating samples that are close to the quality of the original model.
arXiv Detail & Related papers (2024-03-11T15:26:34Z) - One-Step Diffusion Distillation via Deep Equilibrium Models [64.11782639697883]
We introduce a simple yet effective means of distilling diffusion models directly from initial noise to the resulting image.
Our method enables fully offline training with just noise/image pairs from the diffusion model.
We demonstrate that the DEQ architecture is crucial to this capability, as GET matches a $5times$ larger ViT in terms of FID scores.
arXiv Detail & Related papers (2023-12-12T07:28:40Z) - Consistency Models [89.68380014789861]
We propose a new family of models that generate high quality samples by directly mapping noise to data.
They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality.
They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training.
arXiv Detail & Related papers (2023-03-02T18:30:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.