Unifying Diffusion Models' Latent Space, with Applications to
CycleDiffusion and Guidance
- URL: http://arxiv.org/abs/2210.05559v1
- Date: Tue, 11 Oct 2022 15:53:52 GMT
- Title: Unifying Diffusion Models' Latent Space, with Applications to
CycleDiffusion and Guidance
- Authors: Chen Henry Wu, Fernando De la Torre
- Abstract summary: We show that a common latent space emerges from two diffusion models trained independently on related domains.
Applying CycleDiffusion to text-to-image diffusion models, we show that large-scale text-to-image diffusion models can be used as zero-shot image-to-image editors.
- Score: 95.12230117950232
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have achieved unprecedented performance in generative
modeling. The commonly-adopted formulation of the latent code of diffusion
models is a sequence of gradually denoised samples, as opposed to the simpler
(e.g., Gaussian) latent space of GANs, VAEs, and normalizing flows. This paper
provides an alternative, Gaussian formulation of the latent space of various
diffusion models, as well as an invertible DPM-Encoder that maps images into
the latent space. While our formulation is purely based on the definition of
diffusion models, we demonstrate several intriguing consequences. (1)
Empirically, we observe that a common latent space emerges from two diffusion
models trained independently on related domains. In light of this finding, we
propose CycleDiffusion, which uses DPM-Encoder for unpaired image-to-image
translation. Furthermore, applying CycleDiffusion to text-to-image diffusion
models, we show that large-scale text-to-image diffusion models can be used as
zero-shot image-to-image editors. (2) One can guide pre-trained diffusion
models and GANs by controlling the latent codes in a unified, plug-and-play
formulation based on energy-based models. Using the CLIP model and a face
recognition model as guidance, we demonstrate that diffusion models have better
coverage of low-density sub-populations and individuals than GANs.
Related papers
- Distilling Diffusion Models into Conditional GANs [90.76040478677609]
We distill a complex multistep diffusion model into a single-step conditional GAN student model.
For efficient regression loss, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space.
We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models.
arXiv Detail & Related papers (2024-05-09T17:59:40Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - Soft Mixture Denoising: Beyond the Expressive Bottleneck of Diffusion
Models [76.46246743508651]
We show that current diffusion models actually have an expressive bottleneck in backward denoising.
We introduce soft mixture denoising (SMD), an expressive and efficient model for backward denoising.
arXiv Detail & Related papers (2023-09-25T12:03:32Z) - Infinite-Dimensional Diffusion Models [4.342241136871849]
We formulate diffusion-based generative models in infinite dimensions and apply them to the generative modeling of functions.
We show that our formulations are well posed in the infinite-dimensional setting and provide dimension-independent distance bounds from the sample to the target measure.
We also develop guidelines for the design of infinite-dimensional diffusion models.
arXiv Detail & Related papers (2023-02-20T18:00:38Z) - SinDiffusion: Learning a Diffusion Model from a Single Natural Image [159.4285444680301]
We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.
It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales.
Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics.
arXiv Detail & Related papers (2022-11-22T18:00:03Z) - Blurring Diffusion Models [27.339469450737525]
We show that blurring can equivalently be defined through a Gaussian diffusion process with non-isotropic noise.
We propose a class of diffusion models that offers the best of both standard Gaussian denoising diffusion and inverse heat dissipation.
arXiv Detail & Related papers (2022-09-12T19:16:48Z) - Diffusion Models in Vision: A Survey [80.82832715884597]
A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage.
Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens.
arXiv Detail & Related papers (2022-09-10T22:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.