SliderSpace: Decomposing the Visual Capabilities of Diffusion Models
- URL: http://arxiv.org/abs/2502.01639v1
- Date: Mon, 03 Feb 2025 18:59:55 GMT
- Title: SliderSpace: Decomposing the Visual Capabilities of Diffusion Models
- Authors: Rohit Gandikota, Zongze Wu, Richard Zhang, David Bau, Eli Shechtman, Nick Kolkin,
- Abstract summary: SliderSpace is a framework for automatically decomposing the visual capabilities of diffusion models.
It discovers multiple interpretable and diverse directions simultaneously from a single text prompt.
Our method produces more diverse and useful variations compared to baselines.
- Score: 50.82362500995365
- License:
- Abstract: We present SliderSpace, a framework for automatically decomposing the visual capabilities of diffusion models into controllable and human-understandable directions. Unlike existing control methods that require a user to specify attributes for each edit direction individually, SliderSpace discovers multiple interpretable and diverse directions simultaneously from a single text prompt. Each direction is trained as a low-rank adaptor, enabling compositional control and the discovery of surprising possibilities in the model's latent space. Through extensive experiments on state-of-the-art diffusion models, we demonstrate SliderSpace's effectiveness across three applications: concept decomposition, artistic style exploration, and diversity enhancement. Our quantitative evaluation shows that SliderSpace-discovered directions decompose the visual structure of model's knowledge effectively, offering insights into the latent capabilities encoded within diffusion models. User studies further validate that our method produces more diverse and useful variations compared to baselines. Our code, data and trained weights are available at https://sliderspace.baulab.info
Related papers
- Exploring the latent space of diffusion models directly through singular value decomposition [31.900933527692846]
We propose a novel image editing framework that is capable of learning arbitrary attributes from one pair of latent codes destined by text prompts in Diffusion Models.
We will release our codes soon to foster further research and applications in this area.
arXiv Detail & Related papers (2025-02-04T11:04:36Z) - Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts [68.48103545146127]
This paper proposes a novel framework for unsupervised exploration of diffusion latent spaces.
We directly leverage natural language prompts and image captions to map latent directions.
Our method provides a more scalable and interpretable understanding of the semantic knowledge encoded within diffusion models.
arXiv Detail & Related papers (2024-10-25T21:44:51Z) - Generalizable Implicit Neural Representation As a Universal Spatiotemporal Traffic Data Learner [46.866240648471894]
Spatiotemporal Traffic Data (STTD) measures the complex dynamical behaviors of the multiscale transportation system.
We present a novel paradigm to address the STTD learning problem by parameterizing STTD as an implicit neural representation.
We validate its effectiveness through extensive experiments in real-world scenarios, showcasing applications from corridor to network scales.
arXiv Detail & Related papers (2024-06-13T02:03:22Z) - Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models [52.894213114914805]
We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models.
A slider is created using a small set of prompts or sample images.
Our method can help address persistent quality issues in Stable XL Diffusion including repair of object deformations and fixing distorted hands.
arXiv Detail & Related papers (2023-11-20T18:59:01Z) - Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space.
Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias.
During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z) - Discovering Class-Specific GAN Controls for Semantic Image Synthesis [73.91655061467988]
We propose a novel method for finding spatially disentangled class-specific directions in the latent space of pretrained SIS models.
We show that the latent directions found by our method can effectively control the local appearance of semantic classes.
arXiv Detail & Related papers (2022-12-02T21:39:26Z) - Fantastic Style Channels and Where to Find Them: A Submodular Framework
for Discovering Diverse Directions in GANs [0.0]
StyleGAN2 has enabled various image generation and manipulation tasks due to its rich and disentangled latent spaces.
We design a novel submodular framework that finds the most representative and diverse subset of directions in the latent space of StyleGAN2.
Our framework promotes diversity by using the notion of clusters and can be efficiently solved with a greedy optimization scheme.
arXiv Detail & Related papers (2022-03-16T10:35:41Z) - Unsupervised Discovery of Disentangled Manifolds in GANs [74.24771216154105]
Interpretable generation process is beneficial to various image editing applications.
We propose a framework to discover interpretable directions in the latent space given arbitrary pre-trained generative adversarial networks.
arXiv Detail & Related papers (2020-11-24T02:18:08Z) - IntroVAC: Introspective Variational Classifiers for Learning
Interpretable Latent Subspaces [6.574517227976925]
IntroVAC learns interpretable latent subspaces by exploiting information from an additional label.
We show that IntroVAC is able to learn meaningful directions in the latent space enabling fine manipulation of image attributes.
arXiv Detail & Related papers (2020-08-03T10:21:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.