AttriCtrl: Fine-Grained Control of Aesthetic Attribute Intensity in Diffusion Models
- URL: http://arxiv.org/abs/2508.02151v1
- Date: Mon, 04 Aug 2025 07:49:40 GMT
- Title: AttriCtrl: Fine-Grained Control of Aesthetic Attribute Intensity in Diffusion Models
- Authors: Die Chen, Zhongjie Duan, Zhiwen Li, Cen Chen, Daoyuan Chen, Yaliang Li, Yinda Chen,
- Abstract summary: AttriCtrl is a plug-and-play framework for precise and continuous control of aesthetic attributes.<n>We quantify abstract aesthetics by leveraging semantic similarity from pre-trained vision-language models.<n>It is fully compatible with popular open-source controllable generation frameworks.
- Score: 32.46570968627392
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent breakthroughs in text-to-image diffusion models have significantly enhanced both the visual fidelity and semantic controllability of generated images. However, fine-grained control over aesthetic attributes remains challenging, especially when users require continuous and intensity-specific adjustments. Existing approaches often rely on vague textual prompts, which are inherently ambiguous in expressing both the aesthetic semantics and the desired intensity, or depend on costly human preference data for alignment, limiting their scalability and practicality. To address these limitations, we propose AttriCtrl, a plug-and-play framework for precise and continuous control of aesthetic attributes. Specifically, we quantify abstract aesthetics by leveraging semantic similarity from pre-trained vision-language models, and employ a lightweight value encoder that maps scalar intensities in $[0,1]$ to learnable embeddings within diffusion-based generation. This design enables intuitive and customizable aesthetic manipulation, with minimal training overhead and seamless integration into existing generation pipelines. Extensive experiments demonstrate that AttriCtrl achieves accurate control over individual attributes as well as flexible multi-attribute composition. Moreover, it is fully compatible with popular open-source controllable generation frameworks, showcasing strong integration capability and practical utility across diverse generation scenarios.
Related papers
- RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation [16.038598998902767]
Text-to-image (T2I) diffusion models have shown remarkable success in generating high-quality images from text prompts.<n>We propose a flexible feature injection framework that decouples the injection timestep from the denoising process.<n>Our approach achieves state-of-the-art performance across diverse zero-shot conditioning scenarios.
arXiv Detail & Related papers (2025-07-03T16:56:15Z) - ExpertGen: Training-Free Expert Guidance for Controllable Text-to-Face Generation [49.294779074232686]
ExpertGen is a training-free framework that leverages pre-trained expert models to guide generation with fine control.<n>We show qualitatively and quantitatively that expert models can guide the generation process with high precision.
arXiv Detail & Related papers (2025-05-22T20:09:21Z) - InstaRevive: One-Step Image Enhancement via Dynamic Score Matching [66.97989469865828]
InstaRevive is an image enhancement framework that employs score-based diffusion distillation to harness potent generative capability.<n>Our framework delivers high-quality and visually appealing results across a diverse array of challenging tasks and datasets.
arXiv Detail & Related papers (2025-04-22T01:19:53Z) - A Controllable Appearance Representation for Flexible Transfer and Editing [0.44241702149260353]
We present a method that computes an interpretable representation of material appearance within a compact latent space.<n>This representation is learned in a self-supervised fashion using an adapted FactorVAE.<n>Our model demonstrates strong disentanglement and interpretability by effectively encoding material appearance and illumination.
arXiv Detail & Related papers (2025-04-21T11:29:06Z) - ICAS: IP Adapter and ControlNet-based Attention Structure for Multi-Subject Style Transfer Optimization [0.0]
ICAS is a novel framework for efficient and controllable multi-subject style transfer.<n>Our framework ensures faithful global layout preservation alongside accurate local style synthesis.<n>ICAS achieves superior performance in structure preservation, style consistency, and inference efficiency.
arXiv Detail & Related papers (2025-04-17T10:48:11Z) - Leveraging Semantic Attribute Binding for Free-Lunch Color Control in Diffusion Models [53.73253164099701]
We introduce ColorWave, a training-free approach that achieves exact RGB-level color control in diffusion models without fine-tuning.<n>We demonstrate that ColorWave establishes a new paradigm for structured, color-consistent diffusion-based image synthesis.
arXiv Detail & Related papers (2025-03-12T21:49:52Z) - Training-free Quantum-Inspired Image Edge Extraction Method [4.8188571652305185]
We propose a training-free, quantum-inspired edge detection model.<n>Our approach integrates classical Sobel edge detection, the Schr"odinger wave equation refinement, and a hybrid framework.<n>By eliminating the need for training, the model is lightweight and adaptable to diverse applications.
arXiv Detail & Related papers (2025-01-31T07:24:38Z) - Learning from Pattern Completion: Self-supervised Controllable Generation [31.694486524155593]
We propose a self-supervised controllable generation (SCG) framework, inspired by the neural mechanisms that may contribute to the brain's associative power.
Experimental results demonstrate that the proposed modular autoencoder effectively achieves functional specialization.
Our proposed approach not only demonstrates superior robustness in more challenging high-noise scenarios but also possesses more promising scalability potential due to its self-supervised manner.
arXiv Detail & Related papers (2024-09-27T12:28:47Z) - ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - FilterPrompt: A Simple yet Efficient Approach to Guide Image Appearance Transfer in Diffusion Models [20.28288267660839]
FilterPrompt is an approach to enhance the effect of controllable generation.<n>It can be applied to any diffusion model, allowing users to adjust the representation of specific image features.
arXiv Detail & Related papers (2024-04-20T04:17:34Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Toward Fast, Flexible, and Robust Low-Light Image Enhancement [87.27326390675155]
We develop a new Self-Calibrated Illumination (SCI) learning framework for fast, flexible, and robust brightening images in real-world low-light scenarios.
Considering the computational burden of the cascaded pattern, we construct the self-calibrated module which realizes the convergence between results of each stage.
We make comprehensive explorations to SCI's inherent properties including operation-insensitive adaptability and model-irrelevant generality.
arXiv Detail & Related papers (2022-04-21T14:40:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.