High-Resolution Image Synthesis with Latent Diffusion Models
- URL: http://arxiv.org/abs/2112.10752v1
- Date: Mon, 20 Dec 2021 18:55:25 GMT
- Title: High-Resolution Image Synthesis with Latent Diffusion Models
- Authors: Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick
Esser and Bj\"orn Ommer
- Abstract summary: Training diffusion models on autoencoders allows for the first time to reach a near-optimal point between complexity reduction and detail preservation.
Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks.
- Score: 14.786952412297808
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By decomposing the image formation process into a sequential application of
denoising autoencoders, diffusion models (DMs) achieve state-of-the-art
synthesis results on image data and beyond. Additionally, their formulation
allows for a guiding mechanism to control the image generation process without
retraining. However, since these models typically operate directly in pixel
space, optimization of powerful DMs often consumes hundreds of GPU days and
inference is expensive due to sequential evaluations. To enable DM training on
limited computational resources while retaining their quality and flexibility,
we apply them in the latent space of powerful pretrained autoencoders. In
contrast to previous work, training diffusion models on such a representation
allows for the first time to reach a near-optimal point between complexity
reduction and detail preservation, greatly boosting visual fidelity. By
introducing cross-attention layers into the model architecture, we turn
diffusion models into powerful and flexible generators for general conditioning
inputs such as text or bounding boxes and high-resolution synthesis becomes
possible in a convolutional manner. Our latent diffusion models (LDMs) achieve
a new state of the art for image inpainting and highly competitive performance
on various tasks, including unconditional image generation, semantic scene
synthesis, and super-resolution, while significantly reducing computational
requirements compared to pixel-based DMs. Code is available at
https://github.com/CompVis/latent-diffusion .
Related papers
- Visual Autoregressive Modeling for Image Super-Resolution [14.935662351654601]
We propose a novel visual autoregressive modeling for ISR framework with the form of next-scale prediction.
We collect large-scale data and design a training process to obtain robust generative priors.
arXiv Detail & Related papers (2025-01-31T09:53:47Z) - Nested Diffusion Models Using Hierarchical Latent Priors [23.605302440082994]
We introduce nested diffusion models, an efficient and powerful hierarchical generative framework.
Our approach employs a series of diffusion models to progressively generate latent variables at different semantic levels.
To construct these latent variables, we leverage a pre-trained visual encoder, which learns strong semantic visual representations.
arXiv Detail & Related papers (2024-12-08T16:13:39Z) - MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework.
Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss.
We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z) - CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster
Image Generation [49.3016007471979]
Large generative diffusion models have revolutionized text-to-image generation and offer immense potential for conditional generation tasks.
However, their widespread adoption is hindered by the high computational cost, which limits their real-time application.
We introduce a novel method dubbed CoDi, that adapts a pre-trained latent diffusion model to accept additional image conditioning inputs.
arXiv Detail & Related papers (2023-10-02T17:59:18Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - Exploiting Diffusion Prior for Real-World Image Super-Resolution [75.5898357277047]
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution.
By employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model.
arXiv Detail & Related papers (2023-05-11T17:55:25Z) - Representation Learning with Diffusion Models [0.0]
Diffusion models (DMs) have achieved state-of-the-art results for image synthesis tasks as well as density estimation.
We introduce a framework for learning such representations with diffusion models (LRDM)
In particular, the DM and the representation encoder are trained jointly in order to learn rich representations specific to the generative denoising process.
arXiv Detail & Related papers (2022-10-20T07:26:47Z) - DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder [73.1010640692609]
We propose a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis.
Our model achieves state-of-the-art results and generates more photorealistic images specifically.
arXiv Detail & Related papers (2022-06-01T10:39:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.