Cascaded Diffusion Models for High Fidelity Image Generation
- URL: http://arxiv.org/abs/2106.15282v1
- Date: Sun, 30 May 2021 17:14:52 GMT
- Title: Cascaded Diffusion Models for High Fidelity Image Generation
- Authors: Jonathan Ho, Chitwan Saharia, William Chan, David J. Fleet, Mohammad
Norouzi, Tim Salimans
- Abstract summary: We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation challenge.
A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution.
We find that the sample quality of a cascading pipeline relies crucially on conditioning augmentation.
- Score: 53.57766722279425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show that cascaded diffusion models are capable of generating high
fidelity images on the class-conditional ImageNet generation challenge, without
any assistance from auxiliary image classifiers to boost sample quality. A
cascaded diffusion model comprises a pipeline of multiple diffusion models that
generate images of increasing resolution, beginning with a standard diffusion
model at the lowest resolution, followed by one or more super-resolution
diffusion models that successively upsample the image and add higher resolution
details. We find that the sample quality of a cascading pipeline relies
crucially on conditioning augmentation, our proposed method of data
augmentation of the lower resolution conditioning inputs to the
super-resolution models. Our experiments show that conditioning augmentation
prevents compounding error during sampling in a cascaded model, helping us to
train cascading pipelines achieving FID scores of 1.48 at 64x64, 3.52 at
128x128 and 4.88 at 256x256 resolutions, outperforming BigGAN-deep.
Related papers
- Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion [34.70370851239368]
We show that pixel-space models can in fact be very competitive to latent approaches both in quality and efficiency.
We present a simple recipe for scaling end-to-end pixel-space diffusion models to high resolutions.
arXiv Detail & Related papers (2024-10-25T06:20:06Z) - CADS: Unleashing the Diversity of Diffusion Models through Condition-Annealed Sampling [27.795088366122297]
Condition-Annealed Diffusion Sampler (CADS) can be used with any pretrained model and sampling algorithm.
We show that it boosts the diversity of diffusion models in various conditional generation tasks.
arXiv Detail & Related papers (2023-10-26T12:27:56Z) - ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with
Diffusion Models [126.35334860896373]
We investigate the capability of generating images from pre-trained diffusion models at much higher resolutions than the training image sizes.
Existing works for higher-resolution generation, such as attention-based and joint-diffusion approaches, cannot well address these issues.
We propose a simple yet effective re-dilation that can dynamically adjust the convolutional perception field during inference.
arXiv Detail & Related papers (2023-10-11T17:52:39Z) - ACDMSR: Accelerated Conditional Diffusion Models for Single Image
Super-Resolution [84.73658185158222]
We propose a diffusion model-based super-resolution method called ACDMSR.
Our method adapts the standard diffusion model to perform super-resolution through a deterministic iterative denoising process.
Our approach generates more visually realistic counterparts for low-resolution images, emphasizing its effectiveness in practical scenarios.
arXiv Detail & Related papers (2023-07-03T06:49:04Z) - Consistency Models [89.68380014789861]
We propose a new family of models that generate high quality samples by directly mapping noise to data.
They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality.
They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training.
arXiv Detail & Related papers (2023-03-02T18:30:16Z) - SinDiffusion: Learning a Diffusion Model from a Single Natural Image [159.4285444680301]
We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.
It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales.
Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics.
arXiv Detail & Related papers (2022-11-22T18:00:03Z) - High-Resolution Image Editing via Multi-Stage Blended Diffusion [3.834509400202395]
We propose an approach that uses a pre-trained low-resolution diffusion model to edit images in the megapixel range.
We first use Blended Diffusion to edit the image at a low resolution, and then upscale it in multiple stages, using a super-resolution model and Blended Diffusion.
arXiv Detail & Related papers (2022-10-24T06:07:35Z) - On Distillation of Guided Diffusion Models [94.95228078141626]
We propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from.
For standard diffusion models trained on the pixelspace, our approach is able to generate images visually comparable to that of the original model.
For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps.
arXiv Detail & Related papers (2022-10-06T18:03:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.