Simple diffusion: End-to-end diffusion for high resolution images
- URL: http://arxiv.org/abs/2301.11093v2
- Date: Tue, 12 Dec 2023 14:00:08 GMT
- Title: Simple diffusion: End-to-end diffusion for high resolution images
- Authors: Emiel Hoogeboom, Jonathan Heek, Tim Salimans
- Abstract summary: This paper aims to improve denoising diffusion for high resolution images while keeping the model as simple as possible.
The four main findings are: 1) the noise schedule should be adjusted for high resolution images, 2) It is sufficient to scale only a particular part of the architecture, 3) dropout should be added at specific locations in the architecture, and 4) downsampling is an effective strategy to avoid high resolution feature maps.
- Score: 27.47227724865238
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Currently, applying diffusion models in pixel space of high resolution images
is difficult. Instead, existing approaches focus on diffusion in lower
dimensional spaces (latent diffusion), or have multiple super-resolution levels
of generation referred to as cascades. The downside is that these approaches
add additional complexity to the diffusion framework.
This paper aims to improve denoising diffusion for high resolution images
while keeping the model as simple as possible. The paper is centered around the
research question: How can one train a standard denoising diffusion models on
high resolution images, and still obtain performance comparable to these
alternate approaches?
The four main findings are: 1) the noise schedule should be adjusted for high
resolution images, 2) It is sufficient to scale only a particular part of the
architecture, 3) dropout should be added at specific locations in the
architecture, and 4) downsampling is an effective strategy to avoid high
resolution feature maps. Combining these simple yet effective techniques, we
achieve state-of-the-art on image generation among diffusion models without
sampling modifiers on ImageNet.
Related papers
- Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion [34.70370851239368]
We show that pixel-space models can in fact be very competitive to latent approaches both in quality and efficiency.
We present a simple recipe for scaling end-to-end pixel-space diffusion models to high resolutions.
arXiv Detail & Related papers (2024-10-25T06:20:06Z) - DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance [11.44012694656102]
Large-scale generative models, such as text-to-image diffusion models, have garnered widespread attention across diverse domains.
Existing large-scale diffusion models are confined to generating images of up to 1K resolution.
We propose a novel progressive approach that fully utilizes generated low-resolution images to guide the generation of higher-resolution images.
arXiv Detail & Related papers (2024-06-26T16:10:31Z) - Diffusion Models Without Attention [110.5623058129782]
Diffusion State Space Model (DiffuSSM) is an architecture that supplants attention mechanisms with a more scalable state space model backbone.
Our focus on FLOP-efficient architectures in diffusion training marks a significant step forward.
arXiv Detail & Related papers (2023-11-30T05:15:35Z) - ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with
Diffusion Models [126.35334860896373]
We investigate the capability of generating images from pre-trained diffusion models at much higher resolutions than the training image sizes.
Existing works for higher-resolution generation, such as attention-based and joint-diffusion approaches, cannot well address these issues.
We propose a simple yet effective re-dilation that can dynamically adjust the convolutional perception field during inference.
arXiv Detail & Related papers (2023-10-11T17:52:39Z) - Prompt-tuning latent diffusion models for inverse problems [72.13952857287794]
We propose a new method for solving imaging inverse problems using text-to-image latent diffusion models as general priors.
Our method, called P2L, outperforms both image- and latent-diffusion model-based inverse problem solvers on a variety of tasks, such as super-resolution, deblurring, and inpainting.
arXiv Detail & Related papers (2023-10-02T11:31:48Z) - Denoising Diffusion Bridge Models [54.87947768074036]
Diffusion models are powerful generative models that map noise to data using processes.
For many applications such as image editing, the model input comes from a distribution that is not random noise.
In our work, we propose Denoising Diffusion Bridge Models (DDBMs)
arXiv Detail & Related papers (2023-09-29T03:24:24Z) - LLDiffusion: Learning Degradation Representations in Diffusion Models
for Low-Light Image Enhancement [118.83316133601319]
Current deep learning methods for low-light image enhancement (LLIE) typically rely on pixel-wise mapping learned from paired data.
We propose a degradation-aware learning scheme for LLIE using diffusion models, which effectively integrates degradation and image priors into the diffusion process.
arXiv Detail & Related papers (2023-07-27T07:22:51Z) - Real-World Denoising via Diffusion Model [14.722529440511446]
Real-world image denoising aims to recover clean images from noisy images captured in natural environments.
diffusion models have achieved very promising results in the field of image generation, outperforming previous generation models.
This paper proposes a novel general denoising diffusion model that can be used for real-world image denoising.
arXiv Detail & Related papers (2023-05-08T04:48:03Z) - SinDiffusion: Learning a Diffusion Model from a Single Natural Image [159.4285444680301]
We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.
It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales.
Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics.
arXiv Detail & Related papers (2022-11-22T18:00:03Z) - High-Resolution Image Editing via Multi-Stage Blended Diffusion [3.834509400202395]
We propose an approach that uses a pre-trained low-resolution diffusion model to edit images in the megapixel range.
We first use Blended Diffusion to edit the image at a low resolution, and then upscale it in multiple stages, using a super-resolution model and Blended Diffusion.
arXiv Detail & Related papers (2022-10-24T06:07:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.