Diffusion Models Beat GANs on Image Synthesis
- URL: http://arxiv.org/abs/2105.05233v3
- Date: Thu, 13 May 2021 17:57:08 GMT
- Title: Diffusion Models Beat GANs on Image Synthesis
- Authors: Prafulla Dhariwal, Alex Nichol
- Abstract summary: We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models.
For conditional image synthesis, we further improve sample quality with classifier guidance.
We achieve an FID of 2.97 on ImageNet 128$times$128, 4.59 on ImageNet 256$times$256, and 7.72 on ImageNet 512$times$512, and we match BigGAN-deep even with as few as 25 forward passes per sample.
- Score: 4.919647298882951
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show that diffusion models can achieve image sample quality superior to
the current state-of-the-art generative models. We achieve this on
unconditional image synthesis by finding a better architecture through a series
of ablations. For conditional image synthesis, we further improve sample
quality with classifier guidance: a simple, compute-efficient method for
trading off diversity for sample quality using gradients from a classifier. We
achieve an FID of 2.97 on ImageNet 128$\times$128, 4.59 on ImageNet
256$\times$256, and 7.72 on ImageNet 512$\times$512, and we match BigGAN-deep
even with as few as 25 forward passes per sample, all while maintaining better
coverage of the distribution. Finally, we find that classifier guidance
combines well with upsampling diffusion models, further improving FID to 3.85
on ImageNet 512$\times$512. We release our code at
https://github.com/openai/guided-diffusion
Related papers
- Diffusion Models Need Visual Priors for Image Generation [86.92260591389818]
Diffusion on Diffusion (DoD) is an innovative multi-stage generation framework that first extracts visual priors from previously generated samples, then provides rich guidance for the diffusion model.
We evaluate DoD on the popular ImageNet-$256 times 256$ dataset, reducing 7$times$ training cost compared to SiT and DiT.
Our largest model DoD-XL achieves an FID-50K score of 1.83 with only 1 million training steps, which surpasses other state-of-the-art methods without bells and whistles during inference.
arXiv Detail & Related papers (2024-10-11T05:03:56Z) - Guiding a Diffusion Model with a Bad Version of Itself [35.61297232307485]
We show that it is possible to disentangled control over image quality without compromising the amount of variation by guiding generation using a smaller, less-trained version of the model itself rather than an unconditional model.
This leads to significant improvements in ImageNet generation, setting record FIDs of 1.01 for 64x64 and 1.25 for 512x512, using publicly available networks.
arXiv Detail & Related papers (2024-06-04T17:25:59Z) - One-step Diffusion with Distribution Matching Distillation [54.723565605974294]
We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator.
We enforce the one-step image generator match the diffusion model at distribution level, by minimizing an approximate KL divergence.
Our method outperforms all published few-step diffusion approaches, reaching 2.62 FID on ImageNet 64x64 and 11.49 FID on zero-shot COCO-30k.
arXiv Detail & Related papers (2023-11-30T18:59:20Z) - Synthetic Data from Diffusion Models Improves ImageNet Classification [47.999055841125156]
Large-scale text-to image diffusion models can be fine-tuned to produce class conditional models.
Augmenting the ImageNet training set with samples from the resulting models yields significant improvements in ImageNet classification accuracy.
arXiv Detail & Related papers (2023-04-17T17:42:29Z) - On Distillation of Guided Diffusion Models [94.95228078141626]
We propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from.
For standard diffusion models trained on the pixelspace, our approach is able to generate images visually comparable to that of the original model.
For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps.
arXiv Detail & Related papers (2022-10-06T18:03:56Z) - Analog Bits: Generating Discrete Data using Diffusion Models with
Self-Conditioning [90.02873747873444]
Bit Diffusion is a generic approach for generating discrete data with continuous diffusion models.
The proposed approach can achieve strong performance in both discrete image generation and image captioning tasks.
For image captioning on MS-COCO dataset, our approach achieves competitive results compared to autoregressive models.
arXiv Detail & Related papers (2022-08-08T15:08:40Z) - Improving Diffusion Model Efficiency Through Patching [0.0]
We find that adding a simple ViT-style patching transformation can considerably reduce a diffusion model's sampling time and memory usage.
We justify our approach both through an analysis of diffusion model objective, and through empirical experiments on LSUN Church, ImageNet 256, and FFHQ 1024.
arXiv Detail & Related papers (2022-07-09T18:21:32Z) - Cascaded Diffusion Models for High Fidelity Image Generation [53.57766722279425]
We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation challenge.
A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution.
We find that the sample quality of a cascading pipeline relies crucially on conditioning augmentation.
arXiv Detail & Related papers (2021-05-30T17:14:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.