Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
- URL: http://arxiv.org/abs/2312.04410v1
- Date: Thu, 7 Dec 2023 16:26:23 GMT
- Title: Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
- Authors: Jiayi Guo, Xingqian Xu, Yifan Pu, Zanlin Ni, Chaofei Wang, Manushree
Vasu, Shiji Song, Gao Huang, Humphrey Shi
- Abstract summary: Smooth latent spaces ensure that a perturbation on an input latent corresponds to a steady change in the output image.
This property proves beneficial in downstream tasks, including image inversion, inversion, and editing.
We propose Smooth Diffusion, a new category of diffusion models that can be simultaneously high-performing and smooth.
- Score: 82.8261101680427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, diffusion models have made remarkable progress in text-to-image
(T2I) generation, synthesizing images with high fidelity and diverse contents.
Despite this advancement, latent space smoothness within diffusion models
remains largely unexplored. Smooth latent spaces ensure that a perturbation on
an input latent corresponds to a steady change in the output image. This
property proves beneficial in downstream tasks, including image interpolation,
inversion, and editing. In this work, we expose the non-smoothness of diffusion
latent spaces by observing noticeable visual fluctuations resulting from minor
latent variations. To tackle this issue, we propose Smooth Diffusion, a new
category of diffusion models that can be simultaneously high-performing and
smooth. Specifically, we introduce Step-wise Variation Regularization to
enforce the proportion between the variations of an arbitrary input latent and
that of the output image is a constant at any diffusion training step. In
addition, we devise an interpolation standard deviation (ISTD) metric to
effectively assess the latent space smoothness of a diffusion model. Extensive
quantitative and qualitative experiments demonstrate that Smooth Diffusion
stands out as a more desirable solution not only in T2I generation but also
across various downstream tasks. Smooth Diffusion is implemented as a
plug-and-play Smooth-LoRA to work with various community models. Code is
available at https://github.com/SHI-Labs/Smooth-Diffusion.
Related papers
- Towards diffusion models for large-scale sea-ice modelling [0.4498088099418789]
We tailor latent diffusion models to sea-ice physics with a censored Gaussian distribution in data space to generate data that follows the physical bounds of the modelled variables.
Our latent diffusion models reach similar scores as the diffusion model trained in data space, but they smooth the generated fields as caused by the latent mapping.
For large-scale Earth system modelling, latent diffusion models can have many advantages compared to diffusion in data space if the significant barrier of smoothing can be resolved.
arXiv Detail & Related papers (2024-06-26T15:11:15Z) - Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment [56.609042046176555]
suboptimal noise-data mapping leads to slow training of diffusion models.
Drawing inspiration from the immiscibility phenomenon in physics, we propose Immiscible Diffusion.
Our approach is remarkably simple, requiring only one line of code to restrict the diffuse-able area for each image.
arXiv Detail & Related papers (2024-06-18T06:20:42Z) - DiffMorpher: Unleashing the Capability of Diffusion Models for Image
Morphing [28.593023489682654]
We present DiffMorpher, the first approach enabling smooth and natural image morphing using diffusion models.
Our key idea is to capture the semantics of the two images by fitting two LoRAs to them respectively, and interpolate between both the LoRA parameters and the latent noises to ensure a smooth semantic transition.
In addition, we propose an attention and injection technique and a new sampling schedule to further enhance the smoothness between consecutive images.
arXiv Detail & Related papers (2023-12-12T16:28:08Z) - Multi-scale Diffusion Denoised Smoothing [79.95360025953931]
randomized smoothing has become one of a few tangible approaches that offers adversarial robustness to models at scale.
We present scalable methods to address the current trade-off between certified robustness and accuracy in denoised smoothing.
Our experiments show that the proposed multi-scale smoothing scheme combined with diffusion fine-tuning enables strong certified robustness available with high noise level.
arXiv Detail & Related papers (2023-10-25T17:11:21Z) - Eliminating Lipschitz Singularities in Diffusion Models [51.806899946775076]
We show that diffusion models frequently exhibit the infinite Lipschitz near the zero point of timesteps.
This poses a threat to the stability and accuracy of the diffusion process, which relies on integral operations.
We propose a novel approach, dubbed E-TSDM, which eliminates the Lipschitz of the diffusion model near zero.
arXiv Detail & Related papers (2023-06-20T03:05:28Z) - Diffusion Models Generate Images Like Painters: an Analytical Theory of Outline First, Details Later [1.8416014644193066]
We observe that the reverse diffusion process that underlies image generation has the following properties.
Individual trajectories tend to be low-dimensional and resemble 2D rotations'
We find that this solution accurately describes the initial phase of image generation for pretrained models.
arXiv Detail & Related papers (2023-03-04T20:08:57Z) - SinDiffusion: Learning a Diffusion Model from a Single Natural Image [159.4285444680301]
We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.
It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales.
Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics.
arXiv Detail & Related papers (2022-11-22T18:00:03Z) - Unifying Diffusion Models' Latent Space, with Applications to
CycleDiffusion and Guidance [95.12230117950232]
We show that a common latent space emerges from two diffusion models trained independently on related domains.
Applying CycleDiffusion to text-to-image diffusion models, we show that large-scale text-to-image diffusion models can be used as zero-shot image-to-image editors.
arXiv Detail & Related papers (2022-10-11T15:53:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.