Diffusion Models already have a Semantic Latent Space
- URL: http://arxiv.org/abs/2210.10960v2
- Date: Wed, 29 Mar 2023 06:39:50 GMT
- Title: Diffusion Models already have a Semantic Latent Space
- Authors: Mingi Kwon, Jaeseok Jeong, Youngjung Uh
- Abstract summary: We propose asymmetric reverse process (Asyrp) which discovers the semantic latent space in frozen pretrained diffusion models.
Our semantic latent space, named h-space, has nice properties for accommodating semantic image manipulation.
In addition, we introduce a principled design of the generative process for versatile editing and quality boost ing by quantifiable measures.
- Score: 7.638042073679074
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Diffusion models achieve outstanding generative performance in various
domains. Despite their great success, they lack semantic latent space which is
essential for controlling the generative process. To address the problem, we
propose asymmetric reverse process (Asyrp) which discovers the semantic latent
space in frozen pretrained diffusion models. Our semantic latent space, named
h-space, has nice properties for accommodating semantic image manipulation:
homogeneity, linearity, robustness, and consistency across timesteps. In
addition, we introduce a principled design of the generative process for
versatile editing and quality boost ing by quantifiable measures: editing
strength of an interval and quality deficiency at a timestep. Our method is
applicable to various architectures (DDPM++, iD- DPM, and ADM) and datasets
(CelebA-HQ, AFHQ-dog, LSUN-church, LSUN- bedroom, and METFACES). Project page:
https://kwonminki.github.io/Asyrp/
Related papers
- RecTok: Reconstruction Distillation along Rectified Flow [85.51292475005151]
We propose RecTok, which overcomes the limitations of high-dimensional visual tokenizers through two key innovations.<n>Our method distills the semantic information in VFMs into the forward flow trajectories in flow matching.<n>Our RecTok achieves superior image reconstruction, generation quality, and discriminative performance.
arXiv Detail & Related papers (2025-12-15T15:14:20Z) - Variational Masked Diffusion Models [8.801239075625151]
Variational Masked Diffusion (VMD) is a framework that introduces latent variables into the masked diffusion process.<n>We demonstrate that VMD successfully learns dependencies that conventional masked diffusion fails to capture.
arXiv Detail & Related papers (2025-10-27T17:59:57Z) - Authentic Discrete Diffusion Model [72.31371542619121]
Authentic Discrete Diffusion (ADD) framework redefines prior pseudo-discrete approaches.<n>ADD reformulates the diffusion input by directly using float-encoded one-hot class data.<n> experiments demonstrate that ADD achieves superior performance on classification tasks compared to the baseline.
arXiv Detail & Related papers (2025-10-01T15:51:10Z) - FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z) - CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation [11.170848285659572]
Autoencoder accuracy on segmentation mask using quantized embeddings is 8% lower than continuous-valued embeddings.
We propose a continuous-valued embedding framework for semantic segmentation.
Our approach eliminates the need for discrete latent representations while preserving fine-grained semantic details.
arXiv Detail & Related papers (2025-03-19T18:06:54Z) - Aggregation of Multi Diffusion Models for Enhancing Learned Representations [4.126721111013567]
This paper introduces a novel algorithm, Aggregation of Multi Diffusion Models (AMDM)
AMDM synthesizes features from multiple diffusion models into a specified model, enhancing its learned representations to activate specific features for fine-grained control.
Experimental results demonstrate that AMDM significantly improves fine-grained control without additional training or inference time.
arXiv Detail & Related papers (2024-10-02T06:16:06Z) - Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences.
We aim to optimize downstream reward functions while preserving the naturalness of these design spaces.
Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z) - Causal Diffusion Autoencoders: Toward Counterfactual Generation via Diffusion Probabilistic Models [17.124075103464392]
Diffusion models (DPMs) have become the state-of-the-art in high-quality image generation.
DPMs have an arbitrary noisy latent space with no interpretable or controllable semantics.
We propose CausalDiffAE, a diffusion-based causal representation learning framework to enable counterfactual generation.
arXiv Detail & Related papers (2024-04-27T00:09:26Z) - Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models [82.8261101680427]
Smooth latent spaces ensure that a perturbation on an input latent corresponds to a steady change in the output image.
This property proves beneficial in downstream tasks, including image inversion, inversion, and editing.
We propose Smooth Diffusion, a new category of diffusion models that can be simultaneously high-performing and smooth.
arXiv Detail & Related papers (2023-12-07T16:26:23Z) - Mirror Diffusion Models for Constrained and Watermarked Generation [41.27274841596343]
Mirror Diffusion Models (MDM) is a new class of diffusion models that generate data on convex constrained sets without losing tractability.
For safety and privacy purposes, we also explore constrained sets as a new mechanism to embed invisible but quantitative information in generated data.
Our work brings new algorithmic opportunities for learning tractable diffusion on complex domains.
arXiv Detail & Related papers (2023-10-02T14:26:31Z) - A Cheaper and Better Diffusion Language Model with Soft-Masked Noise [62.719656543880596]
Masked-Diffuse LM is a novel diffusion model for language modeling, inspired by linguistic features in languages.
Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data.
We demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.
arXiv Detail & Related papers (2023-04-10T17:58:42Z) - DLT: Conditioned layout generation with Joint Discrete-Continuous
Diffusion Layout Transformer [2.0483033421034142]
We introduce DLT, a joint discrete-continuous diffusion model.
DLT has a flexible conditioning mechanism that allows for conditioning on any given subset of all the layout component classes, locations, and sizes.
Our method outperforms state-of-the-art generative models on various layout generation datasets with respect to different metrics and conditioning settings.
arXiv Detail & Related papers (2023-03-07T09:30:43Z) - Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space.
Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias.
During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z) - SDM: Spatial Diffusion Model for Large Hole Image Inpainting [106.90795513361498]
We present a novel spatial diffusion model (SDM) that uses a few iterations to gradually deliver informative pixels to the entire image.
Also, thanks to the proposed decoupled probabilistic modeling and spatial diffusion scheme, our method achieves high-quality large-hole completion.
arXiv Detail & Related papers (2022-12-06T13:30:18Z) - Layout-to-Image Translation with Double Pooling Generative Adversarial
Networks [76.83075646527521]
We propose a novel Double Pooing GAN (DPGAN) for generating photo-realistic and semantically-consistent results from the input layout.
We also propose a novel Double Pooling Module (DPM), which consists of the Square-shape Pooling Module (SPM) and the Rectangle-shape Pooling Module ( RPM)
arXiv Detail & Related papers (2021-08-29T19:55:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.