Related papers: Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry

Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry

URL: http://arxiv.org/abs/2307.12868v2
Date: Fri, 27 Oct 2023 02:34:05 GMT
Title: Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry
Authors: Yong-Hyun Park, Mingi Kwon, Jaewoong Choi, Junghyo Jo, Youngjung Uh
Abstract summary: We analyze the latent space $mathbfx_t in mathcalX$ from a geometrical perspective. Our approach involves deriving the local latent basis within $mathcalX$ by leveraging the pullback metric. Remarkably, our discovered local latent basis enables image editing capabilities.
Score: 14.401252409755084
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite the success of diffusion models (DMs), we still lack a thorough understanding of their latent space. To understand the latent space $\mathbf{x}_t \in \mathcal{X}$, we analyze them from a geometrical perspective. Our approach involves deriving the local latent basis within $\mathcal{X}$ by leveraging the pullback metric associated with their encoding feature maps. Remarkably, our discovered local latent basis enables image editing capabilities by moving $\mathbf{x}_t$, the latent space of DMs, along the basis vector at specific timesteps. We further analyze how the geometric structure of DMs evolves over diffusion timesteps and differs across different text conditions. This confirms the known phenomenon of coarse-to-fine generation, as well as reveals novel insights such as the discrepancy between $\mathbf{x}_t$ across timesteps, the effect of dataset complexity, and the time-varying influence of text prompts. To the best of our knowledge, this paper is the first to present image editing through $\mathbf{x}$-space traversal, editing only once at specific timestep $t$ without any additional training, and providing thorough analyses of the latent structure of DMs. The code to reproduce our experiments can be found at https://github.com/enkeejunior1/Diffusion-Pullback.

Related papers

Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models [65.71506381302815]
We propose amortize the cost of sampling from a posterior distribution of the form $p(mathbfxmidmathbfy) propto p_theta(mathbfx)$. For many models and constraints of interest, the posterior in the noise space is smoother than the posterior in the data space, making it more amenable to such amortized inference.
arXiv Detail & Related papers (2025-02-10T19:49:54Z)
Can Diffusion Models Learn Hidden Inter-Feature Rules Behind Images? [21.600998338094794]
We focus on the ability of diffusion models (DMs) to learn hidden rules between image features. We investigate whether DMs can accurately capture the inter-feature rule ($p(mathbfy|mathbfx)$) We design four synthetic tasks with strongly correlated features to assess DMs' rule-learning abilities.
arXiv Detail & Related papers (2025-02-07T07:49:37Z)
Conditional Mutual Information Based Diffusion Posterior Sampling for Solving Inverse Problems [3.866047645663101]
In computer vision, tasks such as inpainting, deblurring, and super-resolution are commonly formulated as inverse problems. Recently, diffusion models (DMs) have emerged as a promising approach for addressing noisy linear inverse problems. We propose an information-theoretic approach to improve the effectiveness of DMs in solving inverse problems.
arXiv Detail & Related papers (2025-01-06T09:45:26Z)
Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis [55.561961365113554]
3D Gaussian Splatting (3DGS) has demonstrated remarkable effectiveness for novel view synthesis (NVS) However, the 3DGS model tends to overfit when trained with sparse posed views, limiting its generalization ability to novel views. We present a Self-Ensembling Gaussian Splatting (SE-GS) approach to alleviate the overfitting problem. Our approach improves NVS quality with few-shot training views, outperforming existing state-of-the-art methods.
arXiv Detail & Related papers (2024-10-31T18:43:48Z)
Monge-Ampere Regularization for Learning Arbitrary Shapes from Point Clouds [69.69726932986923]
We propose the scaled-squared distance function (S$2$DF), a novel implicit surface representation for modeling arbitrary surface types. S$2$DF does not distinguish between inside and outside regions while effectively addressing the non-differentiability issue of UDF at the zero level set. We demonstrate that S$2$DF satisfies a second-order partial differential equation of Monge-Ampere-type.
arXiv Detail & Related papers (2024-10-24T06:56:34Z)
Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. In this work, we initiate the study of provably learning a multi-head attention layer from random examples. We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z)
Unsupervised Discovery of Semantic Latent Directions in Diffusion Models [6.107812768939554]
We present an unsupervised method to discover interpretable editing directions for the latent variables $mathbfx_t in mathcalX$ of DMs. The discovered semantic latent directions mostly yield disentangled attribute changes, and they are globally consistent across different samples.
arXiv Detail & Related papers (2023-02-24T05:54:34Z)
Understanding Deep Neural Function Approximation in Reinforcement Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL) We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces. Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z)
SPAGHETTI: Editing Implicit Shapes Through Part Aware Generation [85.09014441196692]
We introduce a method for $mathbfE$diting $mathbfI$mplicit $mathbfS$hapes $mathbfT$hrough. Our architecture allows for manipulation of implicit shapes by means of transforming, interpolating and combining shape segments together.
arXiv Detail & Related papers (2022-01-31T12:31:41Z)
Differentially Private Exploration in Reinforcement Learning with Linear Representation [102.17246636801649]
We first consider the setting of linear-mixture MDPs (Ayoub et al., 2020) (a.k.a. model-based setting) and provide an unified framework for analyzing joint and local differential private (DP) exploration. We further study privacy-preserving exploration in linear MDPs (Jin et al., 2020) (a.k.a. model-free setting) where we provide a $widetildeO(sqrtK/epsilon)$ regret bound for $(epsilon,delta)
arXiv Detail & Related papers (2021-12-02T19:59:50Z)
Mask-Guided Discovery of Semantic Manifolds in Generative Models [0.0]
StyleGAN2 generates images of human faces from random vectors in a lower-dimensional latent space. The model behaves as a black box, providing neither control over its output nor insight into the structures it has learned from the data. We present a method to explore the manifold of changes of spatially localized regions of the face.
arXiv Detail & Related papers (2021-05-15T18:06:38Z)
Predicting First Passage Percolation Shapes Using Neural Networks [0.0]
We construct and fit a neural network able to adequately predict the shape of the set of discovered sites. The main purpose is to give researchers a new tool for textitquickly getting an impression of the shape from the distribution of the passage times.
arXiv Detail & Related papers (2020-06-24T19:10:21Z)
Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces [26.297887542066505]
We consider episodic reinforcement learning with a continuous state-action space which is assumed to be equipped with a natural metric. We propose ZoomRL, an online algorithm that leverages ideas from continuous bandits to learn an adaptive discretization of the joint space. We show that ZoomRL achieves a worst-case regret $tildeO(Hfrac52 Kfracd+1d+2)$ where $H$ is the planning horizon, $K$ is the number of episodes and $d$ is the covering dimension of the space.
arXiv Detail & Related papers (2020-03-09T12:32:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.