Related papers: Which Way from B to A: The role of embedding geometry in image interpolation for Stable Diffusion

Which Way from B to A: The role of embedding geometry in image interpolation for Stable Diffusion

URL: http://arxiv.org/abs/2511.12757v1
Date: Sun, 16 Nov 2025 19:58:48 GMT
Title: Which Way from B to A: The role of embedding geometry in image interpolation for Stable Diffusion
Authors: Nicholas Karris, Luke Durell, Javier Flores, Tegan Emerson,
Abstract summary: We show that Stable Diffusion has a permutation-invariance property with respect to the rows of Contrastive Language-Image Pretraining embedding matrices.<n>This inspired the novel observation that these embeddings can naturally be interpreted as point clouds in a space rather than as matrices in a Euclidean space.<n>By solving this optimal transport problem, we compute a shortest path (or geodesic) between embeddings that captures a more natural and geometrically smooth transition through the embedding space.
Score: 1.824185957798031
License: http://creativecommons.org/licenses/by/4.0/
Abstract: It can be shown that Stable Diffusion has a permutation-invariance property with respect to the rows of Contrastive Language-Image Pretraining (CLIP) embedding matrices. This inspired the novel observation that these embeddings can naturally be interpreted as point clouds in a Wasserstein space rather than as matrices in a Euclidean space. This perspective opens up new possibilities for understanding the geometry of embedding space. For example, when interpolating between embeddings of two distinct prompts, we propose reframing the interpolation problem as an optimal transport problem. By solving this optimal transport problem, we compute a shortest path (or geodesic) between embeddings that captures a more natural and geometrically smooth transition through the embedding space. This results in smoother and more coherent intermediate (interpolated) images when rendered by the Stable Diffusion generative model. We conduct experiments to investigate this effect, comparing the quality of interpolated images produced using optimal transport to those generated by other standard interpolation methods. The novel optimal transport--based approach presented indeed gives smoother image interpolations, suggesting that viewing the embeddings as point clouds (rather than as matrices) better reflects and leverages the geometry of the embedding space.

Related papers

Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation [79.27003481818413]
We introduce FlatVI, a training framework that regularises the latent manifold of discrete-likelihood variational autoencoders towards Euclidean geometry.<n>By encouraging straight lines in the latent space to approximate geodesics on the decoded single-cell manifold, FlatVI enhances compatibility with downstream approaches.
arXiv Detail & Related papers (2025-07-15T23:08:14Z)
GrFormer: A Novel Transformer on Grassmann Manifold for Infrared and Visible Image Fusion [33.925249998725896]
We propose a novel attention mechanism based on Grassmann manifold for infrared and visible image fusion.<n>Our method constructs a low-rank subspace mapping through projection constraints on the Grassmann manifold.<n>This forces the features to decouple into high-frequency details (local low-rank) and low-frequency semantics (global low-rank)
arXiv Detail & Related papers (2025-06-17T10:32:05Z)
AID: Attention Interpolation of Text-to-Image Diffusion [64.87754163416241]
We introduce a training-free technique named Attention Interpolation via Diffusion (AID) AID fuses the interpolated attention with self-attention to boost fidelity. We also present a variant, Conditional-guided Attention Interpolation via Diffusion (AID), that considers as a condition-dependent generative process.
arXiv Detail & Related papers (2024-03-26T17:57:05Z)
Point Cloud Classification via Deep Set Linearized Optimal Transport [51.99765487172328]
We introduce Deep Set Linearized Optimal Transport, an algorithm designed for the efficient simultaneous embedding of point clouds into an $L2-$space. This embedding preserves specific low-dimensional structures within the Wasserstein space while constructing a classifier to distinguish between various classes of point clouds. We showcase the advantages of our algorithm over the standard deep set approach through experiments on a flow dataset with a limited number of labeled point clouds.
arXiv Detail & Related papers (2024-01-02T23:26:33Z)
IMPUS: Image Morphing with Perceptually-Uniform Sampling Using Diffusion Models [24.382275473592046]
We present a diffusion-based image morphing approach with perceptually-uniform sampling (IMPUS) IMPUS produces smooth, direct and realistic adaptations given an image pair.
arXiv Detail & Related papers (2023-11-12T10:03:32Z)
Diffusion Posterior Illumination for Ambiguity-aware Inverse Rendering [63.24476194987721]
Inverse rendering, the process of inferring scene properties from images, is a challenging inverse problem. Most existing solutions incorporate priors into the inverse-rendering pipeline to encourage plausible solutions. We propose a novel scheme that integrates a denoising probabilistic diffusion model pre-trained on natural illumination maps into an optimization framework.
arXiv Detail & Related papers (2023-09-30T12:39:28Z)
Tensor Component Analysis for Interpreting the Latent Space of GANs [41.020230946351816]
This paper addresses the problem of finding interpretable directions in the latent space of pre-trained Generative Adversarial Networks (GANs) Our scheme allows for both linear edits corresponding to the individual modes of the tensor, and non-linear ones that model the multiplicative interactions between them. We show experimentally that we can utilise the former to better separate style- from geometry-based transformations, and the latter to generate an extended set of possible transformations.
arXiv Detail & Related papers (2021-11-23T09:14:39Z)
NeurInt : Learning to Interpolate through Neural ODEs [18.104328632453676]
We propose a novel generative model that learns a distribution of trajectories between two images. We demonstrate our approach's effectiveness in generating images improved quality as well as its ability to learn a diverse distribution over smooth trajectories for any pair of real source and target images.
arXiv Detail & Related papers (2021-11-07T16:31:18Z)
Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation [64.92152574895111]
We propose a simple Orthogonal Jacobian Regularization (OroJaR) to encourage deep generative model to learn disentangled representations. Our method is effective in disentangled and controllable image generation, and performs favorably against the state-of-the-art methods.
arXiv Detail & Related papers (2021-08-17T15:01:46Z)
Joint Estimation of Image Representations and their Lie Invariants [57.3768308075675]
Images encode both the state of the world and its content. The automatic extraction of this information is challenging because of the high-dimensionality and entangled encoding inherent to the image representation. This article introduces two theoretical approaches aimed at the resolution of these challenges.
arXiv Detail & Related papers (2020-12-05T00:07:41Z)
FREDE: Linear-Space Anytime Graph Embeddings [12.53022591889574]
Low-dimensional representations, or embeddings, of a graph's nodes facilitate data mining tasks. FREquent Directions Embedding is a sketching-based method that iteratively improves on quality while processing rows of the similarity matrix individually. Our evaluation on variably sized networks shows that FREDE performs as well as SVD and competitively against current state-of-the-art methods in diverse data mining tasks.
arXiv Detail & Related papers (2020-06-08T16:51:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.