Consistency Models
- URL: http://arxiv.org/abs/2303.01469v2
- Date: Wed, 31 May 2023 06:17:10 GMT
- Title: Consistency Models
- Authors: Yang Song, Prafulla Dhariwal, Mark Chen and Ilya Sutskever
- Abstract summary: We propose a new family of models that generate high quality samples by directly mapping noise to data.
They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality.
They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training.
- Score: 89.68380014789861
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have significantly advanced the fields of image, audio, and
video generation, but they depend on an iterative sampling process that causes
slow generation. To overcome this limitation, we propose consistency models, a
new family of models that generate high quality samples by directly mapping
noise to data. They support fast one-step generation by design, while still
allowing multistep sampling to trade compute for sample quality. They also
support zero-shot data editing, such as image inpainting, colorization, and
super-resolution, without requiring explicit training on these tasks.
Consistency models can be trained either by distilling pre-trained diffusion
models, or as standalone generative models altogether. Through extensive
experiments, we demonstrate that they outperform existing distillation
techniques for diffusion models in one- and few-step sampling, achieving the
new state-of-the-art FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 for
one-step generation. When trained in isolation, consistency models become a new
family of generative models that can outperform existing one-step,
non-adversarial generative models on standard benchmarks such as CIFAR-10,
ImageNet 64x64 and LSUN 256x256.
Related papers
- Provable Statistical Rates for Consistency Diffusion Models [87.28777947976573]
Despite the state-of-the-art performance, diffusion models are known for their slow sample generation due to the extensive number of steps involved.
This paper contributes towards the first statistical theory for consistency models, formulating their training as a distribution discrepancy minimization problem.
arXiv Detail & Related papers (2024-06-23T20:34:18Z) - Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation [52.509092010267665]
We introduce LlamaGen, a new family of image generation models that apply original next-token prediction'' paradigm of large language models to visual generation domain.
It is an affirmative answer to whether vanilla autoregressive models, e.g., Llama, without inductive biases on visual signals can achieve state-of-the-art image generation performance if scaling properly.
arXiv Detail & Related papers (2024-06-10T17:59:52Z) - Multistep Distillation of Diffusion Models via Moment Matching [29.235113968156433]
We present a new method for making diffusion models faster to sample.
The method distills many-step diffusion models into few-step models by matching conditional expectations of the clean data.
We obtain new state-of-the-art results on the Imagenet dataset.
arXiv Detail & Related papers (2024-06-06T14:20:21Z) - Directly Denoising Diffusion Models [6.109141407163027]
We present Directly Denoising Diffusion Model (DDDM), a simple and generic approach for generating realistic images with few-step sampling.
Our model achieves FID scores of 2.57 and 2.33 on CIFAR-10 in one-step and two-step sampling respectively, surpassing those obtained from GANs and distillation-based models.
For ImageNet 64x64, our approach stands as a competitive contender against leading models.
arXiv Detail & Related papers (2024-05-22T11:20:32Z) - SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions [5.100085108873068]
We present two models, SDXS-512 and SDXS-1024, achieving inference speeds of approximately 100 FPS (30x faster than SD v1.5) and 30 FPS (60x faster than SDXL) on a single GPU.
Our training approach offers promising applications in image-conditioned control, facilitating efficient image-to-image translation.
arXiv Detail & Related papers (2024-03-25T11:16:23Z) - Multistep Consistency Models [24.443707181138553]
A 1-step consistency model is a conventional consistency model whereas a $infty$-step consistency model is a diffusion model.
By increasing the sample budget from a single step to 2-8 steps, we can train models more easily that generate higher quality samples.
We show that our method scales to a text-to-image diffusion model, generating samples that are close to the quality of the original model.
arXiv Detail & Related papers (2024-03-11T15:26:34Z) - On Distillation of Guided Diffusion Models [94.95228078141626]
We propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from.
For standard diffusion models trained on the pixelspace, our approach is able to generate images visually comparable to that of the original model.
For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps.
arXiv Detail & Related papers (2022-10-06T18:03:56Z) - Cascaded Diffusion Models for High Fidelity Image Generation [53.57766722279425]
We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation challenge.
A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution.
We find that the sample quality of a cascading pipeline relies crucially on conditioning augmentation.
arXiv Detail & Related papers (2021-05-30T17:14:52Z) - Improved Techniques for Training Score-Based Generative Models [104.20217659157701]
We provide a new theoretical analysis of learning and sampling from score models in high dimensional spaces.
We can effortlessly scale score-based generative models to images with unprecedented resolutions.
Our score-based models can generate high-fidelity samples that rival best-in-class GANs on various image datasets.
arXiv Detail & Related papers (2020-06-16T09:17:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.