Leveraging Semantic Attribute Binding for Free-Lunch Color Control in Diffusion Models
- URL: http://arxiv.org/abs/2503.09864v1
- Date: Wed, 12 Mar 2025 21:49:52 GMT
- Title: Leveraging Semantic Attribute Binding for Free-Lunch Color Control in Diffusion Models
- Authors: Héctor Laria, Alexandra Gomez-Villa, Jiang Qin, Muhammad Atif Butt, Bogdan Raducanu, Javier Vazquez-Corral, Joost van de Weijer, Kai Wang,
- Abstract summary: We introduce ColorWave, a training-free approach that achieves exact RGB-level color control in diffusion models without fine-tuning.<n>We demonstrate that ColorWave establishes a new paradigm for structured, color-consistent diffusion-based image synthesis.
- Score: 53.73253164099701
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent advances in text-to-image (T2I) diffusion models have enabled remarkable control over various attributes, yet precise color specification remains a fundamental challenge. Existing approaches, such as ColorPeel, rely on model personalization, requiring additional optimization and limiting flexibility in specifying arbitrary colors. In this work, we introduce ColorWave, a novel training-free approach that achieves exact RGB-level color control in diffusion models without fine-tuning. By systematically analyzing the cross-attention mechanisms within IP-Adapter, we uncover an implicit binding between textual color descriptors and reference image features. Leveraging this insight, our method rewires these bindings to enforce precise color attribution while preserving the generative capabilities of pretrained models. Our approach maintains generation quality and diversity, outperforming prior methods in accuracy and applicability across diverse object categories. Through extensive evaluations, we demonstrate that ColorWave establishes a new paradigm for structured, color-consistent diffusion-based image synthesis.
Related papers
- Free-Lunch Color-Texture Disentanglement for Stylized Image Generation [58.406368812760256]
This paper introduces the first tuning-free approach to achieve free-lunch color-texture disentanglement in stylized T2I generation.
We develop techniques for separating and extracting Color-Texture Embeddings (CTE) from individual color and texture reference images.
To ensure that the color palette of the generated image aligns closely with the color reference, we apply a whitening and coloring transformation.
arXiv Detail & Related papers (2025-03-18T14:10:43Z) - Color Alignment in Diffusion [29.15171578869268]
Diffusion models have shown great promise in synthesizing visually appealing images.<n>We introduce a novel color alignment algorithm that confines the generative process in diffusion models within a given color pattern.<n>Results demonstrate our state-of-the-art performance in conditioning and controlling of color pixels, while maintaining on-par generation quality and diversity.
arXiv Detail & Related papers (2025-03-09T20:02:52Z) - MangaNinja: Line Art Colorization with Precise Reference Following [84.2001766692797]
MangaNinjia specializes in the task of reference-guided line art colorization.<n>We incorporate two thoughtful designs to ensure precise character detail transcription.<n>A patch shuffling module to facilitate correspondence learning between the reference color image and the target line art, and a point-driven control scheme to enable fine-grained color matching.
arXiv Detail & Related papers (2025-01-14T18:59:55Z) - ColorFlow: Retrieval-Augmented Image Sequence Colorization [65.93834649502898]
We propose a three-stage diffusion-based framework tailored for image sequence colorization in industrial applications.<n>Unlike existing methods that require per-ID finetuning or explicit ID embedding extraction, we propose a novel Retrieval Augmented Colorization pipeline.<n>Our pipeline also features a dual-branch design: one branch for color identity extraction and the other for colorization.
arXiv Detail & Related papers (2024-12-16T14:32:49Z) - Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI [59.96044730204345]
We introduce Derivative-Free Diffusion Manifold-Constrainted Gradients (FreeMCG)
FreeMCG serves as an improved basis for explainability of a given neural network.
We show that our method yields state-of-the-art results while preserving the essential properties expected of XAI tools.
arXiv Detail & Related papers (2024-11-22T11:15:14Z) - ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - Control Color: Multimodal Diffusion-based Interactive Image Colorization [81.68817300796644]
Control Color (Ctrl Color) is a multi-modal colorization method that leverages the pre-trained Stable Diffusion (SD) model.
We present an effective way to encode user strokes to enable precise local color manipulation.
We also introduce a novel module based on self-attention and a content-guided deformable autoencoder to address the long-standing issues of color overflow and inaccurate coloring.
arXiv Detail & Related papers (2024-02-16T17:51:13Z) - Diffusing Colors: Image Colorization with Text Guided Diffusion [11.727899027933466]
We present a novel image colorization framework that utilizes image diffusion techniques with granular text prompts.
Our method provides a balance between automation and control, outperforming existing techniques in terms of visual quality and semantic coherence.
Our approach holds potential particularly for color enhancement and historical image colorization.
arXiv Detail & Related papers (2023-12-07T08:59:20Z) - DiffColor: Toward High Fidelity Text-Guided Image Colorization with
Diffusion Models [12.897939032560537]
We propose a new method called DiffColor to recover vivid colors conditioned on a prompt text.
We first fine-tune a pre-trained text-to-image model to generate colorized images using a CLIP-based contrastive loss.
Then we try to obtain an optimized text embedding aligning the colorized image and the text prompt, and a fine-tuned diffusion model enabling high-quality image reconstruction.
Our method can produce vivid and diverse colors with a few iterations, and keep the structure and background intact while having colors well-aligned with the target language guidance.
arXiv Detail & Related papers (2023-08-03T09:38:35Z) - Video Colorization with Pre-trained Text-to-Image Diffusion Models [19.807766482434563]
We present ColorDiffuser, an adaptation of a pre-trained text-to-image latent diffusion model for video colorization.
We propose two novel techniques to enhance the temporal coherence and maintain the vividness of colorization across frames.
arXiv Detail & Related papers (2023-06-02T17:58:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.