Latent Diffusion : Multi-Dimension Stable Diffusion Latent Space Explorer
- URL: http://arxiv.org/abs/2509.22038v1
- Date: Fri, 26 Sep 2025 08:15:58 GMT
- Title: Latent Diffusion : Multi-Dimension Stable Diffusion Latent Space Explorer
- Authors: Zhihua Zhong, Xuanyang Huang,
- Abstract summary: This paper introduces workname, a framework for integrating customizable latent space operations into the diffusion process.<n>By enabling direct manipulation of conceptual and spatial representations, this approach expands creative possibilities in generative art.
- Score: 6.6933005224319695
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Latent space is one of the key concepts in generative AI, offering powerful means for creative exploration through vector manipulation. However, diffusion models like Stable Diffusion lack the intuitive latent vector control found in GANs, limiting their flexibility for artistic expression. This paper introduces \workname, a framework for integrating customizable latent space operations into the diffusion process. By enabling direct manipulation of conceptual and spatial representations, this approach expands creative possibilities in generative art. We demonstrate the potential of this framework through two artworks, \textit{Infinitepedia} and \textit{Latent Motion}, highlighting its use in conceptual blending and dynamic motion generation. Our findings reveal latent space structures with semantic and meaningless regions, offering insights into the geometry of diffusion models and paving the way for further explorations of latent space.
Related papers
- Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner [66.86440230599656]
We argue that diffusion language models do not necessarily need to be in the discrete space.<n>In particular, we prove that continuous diffusion models have stronger expressivity than discrete diffusions and looped transformers.<n>We propose Coevolutionary Continuous Diffusion (CCDD), which defines a joint multimodal diffusion process on the union of a continuous representation space and a discrete token space.
arXiv Detail & Related papers (2025-10-03T17:44:41Z) - Exploring the latent space of diffusion models directly through singular value decomposition [31.900933527692846]
We propose a novel image editing framework that is capable of learning arbitrary attributes from one pair of latent codes destined by text prompts in Diffusion Models.<n>We will release our codes soon to foster further research and applications in this area.
arXiv Detail & Related papers (2025-02-04T11:04:36Z) - SliderSpace: Decomposing the Visual Capabilities of Diffusion Models [50.82362500995365]
SliderSpace is a framework for automatically decomposing the visual capabilities of diffusion models.<n>It discovers multiple interpretable and diverse directions simultaneously from a single text prompt.<n>Our method produces more diverse and useful variations compared to baselines.
arXiv Detail & Related papers (2025-02-03T18:59:55Z) - Exploring Representation-Aligned Latent Space for Better Generation [86.45670422239317]
We introduce ReaLS, which integrates semantic priors to improve generation performance.<n>We show that fundamental DiT and SiT trained on ReaLS can achieve a 15% improvement in FID metric.<n>The enhanced semantic latent space enables more perceptual downstream tasks, such as segmentation and depth estimation.
arXiv Detail & Related papers (2025-02-01T07:42:12Z) - Unsupervised Region-Based Image Editing of Denoising Diffusion Models [50.005612464340246]
We propose a method to identify semantic attributes in the latent space of pre-trained diffusion models without any further training.<n>Our approach facilitates precise semantic discovery and control over local masked areas, eliminating the need for annotations.
arXiv Detail & Related papers (2024-12-17T13:46:12Z) - Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts [68.48103545146127]
This paper proposes a novel framework for unsupervised exploration of diffusion latent spaces.
We directly leverage natural language prompts and image captions to map latent directions.
Our method provides a more scalable and interpretable understanding of the semantic knowledge encoded within diffusion models.
arXiv Detail & Related papers (2024-10-25T21:44:51Z) - How to Blend Concepts in Diffusion Models [48.68800153838679]
Recent methods exploit multiple latent representations and their connection, making this research question even more entangled.
Our goal is to understand how operations in the latent space affect the underlying concepts.
Our conclusion is that concept blending through space manipulation is possible, although the best strategy depends on the context of the blend.
arXiv Detail & Related papers (2024-07-19T13:05:57Z) - Isometric Representation Learning for Disentangled Latent Space of Diffusion Models [17.64488229224982]
We present Isometric Diffusion, equipping a diffusion model with a geometric regularizer to guide the model to learn a geometrically sound latent space of the training data manifold.
This approach allows diffusion models to learn a more disentangled latent space, which enables smoother, precise more accurate inversion, and more control over attributes directly in the latent space.
arXiv Detail & Related papers (2024-07-16T07:36:01Z) - Fine-grained Appearance Transfer with Diffusion Models [23.29713777525402]
Image-to-image translation (I2I) seeks to alter the visual appearance between images while maintaining structural coherence.
This paper proposes an innovative framework designed to surmount these challenges by integrating various aspects of semantic matching, appearance transfer, and latent deviation.
arXiv Detail & Related papers (2023-11-27T04:00:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.