All-in-One Slider for Attribute Manipulation in Diffusion Models
- URL: http://arxiv.org/abs/2508.19195v1
- Date: Tue, 26 Aug 2025 16:56:30 GMT
- Title: All-in-One Slider for Attribute Manipulation in Diffusion Models
- Authors: Weixin Ye, Hongguang Zhu, Wei Wang, Yahui Liu, Mengyu Wang,
- Abstract summary: We introduce the All-in-One Slider, a lightweight module that decomposes the text embedding space into sparse, semantically meaningful attribute directions.<n>By recombining the learned directions, the All-in-One Slider supports zero-shot manipulation of unseen attributes.<n>Our method can be extended to integrate with the inversion framework to perform attribute manipulation on real images.
- Score: 13.362768653792097
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image (T2I) diffusion models have made significant strides in generating high-quality images. However, progressively manipulating certain attributes of generated images to meet the desired user expectations remains challenging, particularly for content with rich details, such as human faces. Some studies have attempted to address this by training slider modules. However, they follow a One-for-One manner, where an independent slider is trained for each attribute, requiring additional training whenever a new attribute is introduced. This not only results in parameter redundancy accumulated by sliders but also restricts the flexibility of practical applications and the scalability of attribute manipulation. To address this issue, we introduce the All-in-One Slider, a lightweight module that decomposes the text embedding space into sparse, semantically meaningful attribute directions. Once trained, it functions as a general-purpose slider, enabling interpretable and fine-grained continuous control over various attributes. Moreover, by recombining the learned directions, the All-in-One Slider supports zero-shot manipulation of unseen attributes (e.g., races and celebrities) and the composition of multiple attributes. Extensive experiments demonstrate that our method enables accurate and scalable attribute manipulation, achieving notable improvements compared to previous methods. Furthermore, our method can be extended to integrate with the inversion framework to perform attribute manipulation on real images, broadening its applicability to various real-world scenarios. The code and trained model will be released at: https://github.com/ywxsuperstar/KSAE-FaceSteer.
Related papers
- CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation [29.82973120718493]
In text-to-image (T2I) generation, achieving fine-grained control over attributes - such as age or smile - remains challenging.<n>We introduce CompSlider, which generates a conditional prior for the T2I foundation model to control multiple attributes simultaneously.<n>We evaluate our approach on a variety of image attributes and highlight its generality by extending to video generation.
arXiv Detail & Related papers (2025-08-31T23:36:44Z) - Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models [53.385754347812835]
Concept Sliders introduced a method for fine-grained image control and editing by learning concepts (attributes/objects)
This approach adds parameters and increases inference time due to the loading and unloading of Low-Rank Adapters (LoRAs) used for learning concepts.
We propose a straightforward textual inversion method to learn concepts through text embeddings, which are generalizable across models that share the same text encoder.
arXiv Detail & Related papers (2024-09-25T01:02:30Z) - Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition [58.79807861739438]
Existing pedestrian recognition (PAR) algorithms are mainly developed based on a static image.
We propose to understand human attributes using video frames that can fully use temporal information.
arXiv Detail & Related papers (2024-04-27T14:43:32Z) - Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models [52.894213114914805]
We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models.
A slider is created using a small set of prompts or sample images.
Our method can help address persistent quality issues in Stable XL Diffusion including repair of object deformations and fixing distorted hands.
arXiv Detail & Related papers (2023-11-20T18:59:01Z) - Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification [78.52704557647438]
We propose a novel FIne-grained Representation and Recomposition (FIRe$2$) framework to tackle both limitations without any auxiliary annotation or data.
Experiments demonstrate that FIRe$2$ can achieve state-of-the-art performance on five widely-used cloth-changing person Re-ID benchmarks.
arXiv Detail & Related papers (2023-08-21T12:59:48Z) - Leveraging Off-the-shelf Diffusion Model for Multi-attribute Fashion
Image Manipulation [27.587905673112473]
Fashion attribute editing is a task that aims to convert the semantic attributes of a given fashion image while preserving the irrelevant regions.
Previous works typically employ conditional GANs where the generator explicitly learns the target attributes and directly execute the conversion.
We explore the classifier-guided diffusion that leverages the off-the-shelf diffusion model pretrained on general visual semantics such as Imagenet.
arXiv Detail & Related papers (2022-10-12T02:21:18Z) - ManiCLIP: Multi-Attribute Face Manipulation from Text [104.30600573306991]
We present a novel multi-attribute face manipulation method based on textual descriptions.
Our method generates natural manipulated faces with minimal text-irrelevant attribute editing.
arXiv Detail & Related papers (2022-10-02T07:22:55Z) - Everything is There in Latent Space: Attribute Editing and Attribute
Style Manipulation by StyleGAN Latent Space Exploration [39.18239951479647]
We present Few-shot Latent-based Attribute Manipulation and Editing (FLAME)
FLAME is a framework to perform highly controlled image editing by latent space manipulation.
We generate diverse attribute styles in disentangled manner.
arXiv Detail & Related papers (2022-07-20T12:40:32Z) - Boosting Zero-shot Learning via Contrastive Optimization of Attribute
Representations [28.46906100680767]
We propose a new framework to boost Zero-shot learning (ZSL) by explicitly learning attribute prototypes beyond images.
A prototype generation module is designed to generate attribute prototypes from attribute semantics.
A hard example-based contrastive optimization scheme is introduced to reinforce attribute-level features in the embedding space.
arXiv Detail & Related papers (2022-07-08T11:05:35Z) - SMILE: Semantically-guided Multi-attribute Image and Layout Editing [154.69452301122175]
Attribute image manipulation has been a very active topic since the introduction of Generative Adversarial Networks (GANs)
We present a multimodal representation that handles all attributes, be it guided by random noise or images, while only using the underlying domain information of the target domain.
Our method is capable of adding, removing or changing either fine-grained or coarse attributes by using an image as a reference or by exploring the style distribution space.
arXiv Detail & Related papers (2020-10-05T20:15:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.