Unsupervised Discovery of Semantic Latent Directions in Diffusion Models
- URL: http://arxiv.org/abs/2302.12469v1
- Date: Fri, 24 Feb 2023 05:54:34 GMT
- Title: Unsupervised Discovery of Semantic Latent Directions in Diffusion Models
- Authors: Yong-Hyun Park, Mingi Kwon, Junghyo Jo, Youngjung Uh
- Abstract summary: We present an unsupervised method to discover interpretable editing directions for the latent variables $mathbfx_t in mathcalX$ of DMs.
The discovered semantic latent directions mostly yield disentangled attribute changes, and they are globally consistent across different samples.
- Score: 6.107812768939554
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the success of diffusion models (DMs), we still lack a thorough
understanding of their latent space. While image editing with GANs builds upon
latent space, DMs rely on editing the conditions such as text prompts. We
present an unsupervised method to discover interpretable editing directions for
the latent variables $\mathbf{x}_t \in \mathcal{X}$ of DMs. Our method adopts
Riemannian geometry between $\mathcal{X}$ and the intermediate feature maps
$\mathcal{H}$ of the U-Nets to provide a deep understanding over the
geometrical structure of $\mathcal{X}$. The discovered semantic latent
directions mostly yield disentangled attribute changes, and they are globally
consistent across different samples. Furthermore, editing in earlier timesteps
edits coarse attributes, while ones in later timesteps focus on high-frequency
details. We define the curvedness of a line segment between samples to show
that $\mathcal{X}$ is a curved manifold. Experiments on different baselines and
datasets demonstrate the effectiveness of our method even on Stable Diffusion.
Our source code will be publicly available for the future researchers.
Related papers
- Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis [55.561961365113554]
3D Gaussian Splatting (3DGS) has demonstrated remarkable effectiveness for novel view synthesis (NVS)
However, the 3DGS model tends to overfit when trained with sparse posed views, limiting its generalization ability to novel views.
We present a Self-Ensembling Gaussian Splatting (SE-GS) approach to alleviate the overfitting problem.
Our approach improves NVS quality with few-shot training views, outperforming existing state-of-the-art methods.
arXiv Detail & Related papers (2024-10-31T18:43:48Z) - Amortizing intractable inference in diffusion models for vision, language, and control [89.65631572949702]
This paper studies amortized sampling of the posterior over data, $mathbfxsim prm post(mathbfx)propto p(mathbfx)r(mathbfx)$, in a model that consists of a diffusion generative model prior $p(mathbfx)$ and a black-box constraint or function $r(mathbfx)$.
We prove the correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from
arXiv Detail & Related papers (2024-05-31T16:18:46Z) - Nearly $d$-Linear Convergence Bounds for Diffusion Models via Stochastic
Localization [40.808942894229325]
We provide the first convergence bounds which are linear in the data dimension.
We show that diffusion models require at most $tilde O(fracd log2(1/delta)varepsilon2)$ steps to approximate an arbitrary distribution.
arXiv Detail & Related papers (2023-08-07T16:01:14Z) - Understanding the Latent Space of Diffusion Models through the Lens of
Riemannian Geometry [14.401252409755084]
We analyze the latent space $mathbfx_t in mathcalX$ from a geometrical perspective.
Our approach involves deriving the local latent basis within $mathcalX$ by leveraging the pullback metric.
Remarkably, our discovered local latent basis enables image editing capabilities.
arXiv Detail & Related papers (2023-07-24T15:06:42Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models [21.173910627285338]
Denoising Diffusion Models (DDMs) have emerged as a strong competitor to Generative Adversarial Networks (GANs)
In this paper, we explore the properties of h-space and propose several novel methods for finding meaningful semantic directions within it.
Our approaches are applicable without requiring architectural modifications, text-based guidance, CLIP-based optimization, or model fine-tuning.
arXiv Detail & Related papers (2023-03-20T12:59:32Z) - Neural Implicit Manifold Learning for Topology-Aware Density Estimation [15.878635603835063]
Current generative models learn $mathcalM$ by mapping an $m$-dimensional latent variable through a neural network.
We show that our model can learn manifold-supported distributions with complex topologies more accurately than pushforward models.
arXiv Detail & Related papers (2022-06-22T18:00:00Z) - The Manifold Hypothesis for Gradient-Based Explanations [55.01671263121624]
gradient-based explanation algorithms provide perceptually-aligned explanations.
We show that the more a feature attribution is aligned with the tangent space of the data, the more perceptually-aligned it tends to be.
We suggest that explanation algorithms should actively strive to align their explanations with the data manifold.
arXiv Detail & Related papers (2022-06-15T08:49:24Z) - SPAGHETTI: Editing Implicit Shapes Through Part Aware Generation [85.09014441196692]
We introduce a method for $mathbfE$diting $mathbfI$mplicit $mathbfS$hapes $mathbfT$hrough.
Our architecture allows for manipulation of implicit shapes by means of transforming, interpolating and combining shape segments together.
arXiv Detail & Related papers (2022-01-31T12:31:41Z) - Differentially Private Exploration in Reinforcement Learning with Linear
Representation [102.17246636801649]
We first consider the setting of linear-mixture MDPs (Ayoub et al., 2020) (a.k.a. model-based setting) and provide an unified framework for analyzing joint and local differential private (DP) exploration.
We further study privacy-preserving exploration in linear MDPs (Jin et al., 2020) (a.k.a. model-free setting) where we provide a $widetildeO(sqrtK/epsilon)$ regret bound for $(epsilon,delta)
arXiv Detail & Related papers (2021-12-02T19:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.