Related papers: CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation

CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation

URL: http://arxiv.org/abs/2509.01028v2
Date: Wed, 03 Sep 2025 15:01:47 GMT
Title: CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation
Authors: Zixin Zhu, Kevin Duarte, Mamshad Nayeem Rizve, Chengyuan Xu, Ratheesh Kalarot, Junsong Yuan,
Abstract summary: In text-to-image (T2I) generation, achieving fine-grained control over attributes - such as age or smile - remains challenging.<n>We introduce CompSlider, which generates a conditional prior for the T2I foundation model to control multiple attributes simultaneously.<n>We evaluate our approach on a variety of image attributes and highlight its generality by extending to video generation.
Score: 29.82973120718493
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In text-to-image (T2I) generation, achieving fine-grained control over attributes - such as age or smile - remains challenging, even with detailed text prompts. Slider-based methods offer a solution for precise control of image attributes. Existing approaches typically train individual adapter for each attribute separately, overlooking the entanglement among multiple attributes. As a result, interference occurs among different attributes, preventing precise control of multiple attributes together. To address this challenge, we aim to disentangle multiple attributes in slider-based generation to enbale more reliable and independent attribute manipulation. Our approach, CompSlider, can generate a conditional prior for the T2I foundation model to control multiple attributes simultaneously. Furthermore, we introduce novel disentanglement and structure losses to compose multiple attribute changes while maintaining structural consistency within the image. Since CompSlider operates in the latent space of the conditional prior and does not require retraining the foundation model, it reduces the computational burden for both training and inference. We evaluate our approach on a variety of image attributes and highlight its generality by extending to video generation.

Related papers

SliderEdit: Continuous Image Editing with Fine-Grained Instruction Control [50.76070785417023]
We introduce SliderEdit, a framework for continuous image editing with fine-grained, interpretable instruction control.<n>Given a multi-part edit instruction, SliderEdit disentangles the individual instructions and exposes each as a globally trained slider.<n>Our results pave the way for interactive, instruction-driven image manipulation with continuous and compositional control.
arXiv Detail & Related papers (2025-11-12T20:21:37Z)
All-in-One Slider for Attribute Manipulation in Diffusion Models [13.362768653792097]
We introduce the All-in-One Slider, a lightweight module that decomposes the text embedding space into sparse, semantically meaningful attribute directions.<n>By recombining the learned directions, the All-in-One Slider supports zero-shot manipulation of unseen attributes.<n>Our method can be extended to integrate with the inversion framework to perform attribute manipulation on real images.
arXiv Detail & Related papers (2025-08-26T16:56:30Z)
Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder [11.392007197036525]
We introduce the Attribute (Att) Adapter, a novel plug-and-play module designed to enable fine-grained, multi-attributes control in pretrained diffusion models.<n>Att-Adapter is flexible, requiring no paired synthetic data for training, and is easily scalable to multiple attributes within a single model.
arXiv Detail & Related papers (2025-03-15T01:06:34Z)
Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions [20.351245266660378]
Recent advances in text-to-image (T2I) diffusion models have significantly improved the quality of generated images.<n>Providing efficient control over individual subjects, particularly the attributes characterizing them, remains a key challenge.<n>No current approach offers both simultaneously, resulting in a gap when trying to achieve precise continuous and subject-specific attribute modulation.
arXiv Detail & Related papers (2024-03-25T18:00:42Z)
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models [52.894213114914805]
We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models. A slider is created using a small set of prompts or sample images. Our method can help address persistent quality issues in Stable XL Diffusion including repair of object deformations and fixing distorted hands.
arXiv Detail & Related papers (2023-11-20T18:59:01Z)
PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor [135.17302411419834]
PAIR Diffusion is a generic framework that enables a diffusion model to control the structure and appearance of each object in the image. We show that having control over the properties of each object in an image leads to comprehensive editing capabilities. Our framework allows for various object-level editing operations on real images such as reference image-based appearance editing, free-form shape editing, adding objects, and variations.
arXiv Detail & Related papers (2023-03-30T17:13:56Z)
ManiCLIP: Multi-Attribute Face Manipulation from Text [104.30600573306991]
We present a novel multi-attribute face manipulation method based on textual descriptions. Our method generates natural manipulated faces with minimal text-irrelevant attribute editing.
arXiv Detail & Related papers (2022-10-02T07:22:55Z)
Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation [136.53288628437355]
Controllable semantic image editing enables a user to change entire image attributes with few clicks. Current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism. We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation.
arXiv Detail & Related papers (2021-02-01T21:38:36Z)
SMILE: Semantically-guided Multi-attribute Image and Layout Editing [154.69452301122175]
Attribute image manipulation has been a very active topic since the introduction of Generative Adversarial Networks (GANs) We present a multimodal representation that handles all attributes, be it guided by random noise or images, while only using the underlying domain information of the target domain. Our method is capable of adding, removing or changing either fine-grained or coarse attributes by using an image as a reference or by exploring the style distribution space.
arXiv Detail & Related papers (2020-10-05T20:15:21Z)
Prominent Attribute Modification using Attribute Dependent Generative Adversarial Network [4.654937118111992]
The proposed approach is based on two generators and two discriminators that utilize the binary as well as the real representation of the attributes. Experiments on the CelebA dataset show that our method effectively performs the multiple attribute editing with preserving other facial details intactly.
arXiv Detail & Related papers (2020-04-24T13:38:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.