Color Me Correctly: Bridging Perceptual Color Spaces and Text Embeddings for Improved Diffusion Generation
- URL: http://arxiv.org/abs/2509.10058v1
- Date: Fri, 12 Sep 2025 08:44:22 GMT
- Title: Color Me Correctly: Bridging Perceptual Color Spaces and Text Embeddings for Improved Diffusion Generation
- Authors: Sung-Lin Tsai, Bo-Lun Huang, Yu Ting Shen, Cheng Yu Yeo, Chiang Tseng, Bo-Kai Ruan, Wen-Sheng Lien, Hong-Han Shuai,
- Abstract summary: Existing approaches rely on cross-attention manipulation, reference images, or fine-tuning to resolve ambiguous color descriptions.<n>We propose a training-free framework that enhances color fidelity by leveraging a large language model (LLM) to disambiguate color-related prompts.<n>Our method first employs a large language model (LLM) to resolve ambiguous color terms in the text prompt, and then refines the text embeddings based on the spatial relationships of the resulting color terms.
- Score: 21.37070510103594
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate color alignment in text-to-image (T2I) generation is critical for applications such as fashion, product visualization, and interior design, yet current diffusion models struggle with nuanced and compound color terms (e.g., Tiffany blue, lime green, hot pink), often producing images that are misaligned with human intent. Existing approaches rely on cross-attention manipulation, reference images, or fine-tuning but fail to systematically resolve ambiguous color descriptions. To precisely render colors under prompt ambiguity, we propose a training-free framework that enhances color fidelity by leveraging a large language model (LLM) to disambiguate color-related prompts and guiding color blending operations directly in the text embedding space. Our method first employs a large language model (LLM) to resolve ambiguous color terms in the text prompt, and then refines the text embeddings based on the spatial relationships of the resulting color terms in the CIELAB color space. Unlike prior methods, our approach improves color accuracy without requiring additional training or external reference images. Experimental results demonstrate that our framework improves color alignment without compromising image quality, bridging the gap between text semantics and visual generation.
Related papers
- GenColorBench: A Color Evaluation Benchmark for Text-to-Image Generation Models [61.786094845872576]
We propose GenColorBench, the first comprehensive benchmark for text-to-image color generation.<n>It is grounded in color systems like I SCC-NBS and CSS3/X11, including numerical colors which are absent elsewhere.<n>With 44K color-focused prompts covering 400+ colors, it reveals models' true capabilities via perceptual and automated assessments.
arXiv Detail & Related papers (2025-10-23T14:12:55Z) - Language-based Image Colorization: A Benchmark and Beyond [19.70668766997928]
Automatic image colorization methods struggle to generate high-quality images due to color ambiguity.<n>Language-based colorization methods are proposed to fully utilize the efficiency and flexibly of text descriptions to guide colorization.<n>This is the first comprehensive review and benchmark on language-based image colorization field.
arXiv Detail & Related papers (2025-03-19T08:09:32Z) - Free-Lunch Color-Texture Disentanglement for Stylized Image Generation [58.406368812760256]
This paper introduces the first tuning-free approach to achieve free-lunch color-texture disentanglement in stylized T2I generation.<n>We develop techniques for separating and extracting Color-Texture Embeddings (CTE) from individual color and texture reference images.<n>To ensure that the color palette of the generated image aligns closely with the color reference, we apply a whitening and coloring transformation.
arXiv Detail & Related papers (2025-03-18T14:10:43Z) - ColorEdit: Training-free Image-Guided Color editing with diffusion model [23.519884152019642]
Text-to-image (T2I) diffusion models have been adopted for image editing tasks, demonstrating remarkable efficacy.<n>However, due to attention leakage and collision between the cross-attention map of the object and the new color attribute from the text prompt, text-guided image editing methods may fail to change the color of an object.<n>We propose a straightforward, yet stable, and effective image-guided method to modify the color of an object without requiring any additional fine-tuning or training.
arXiv Detail & Related papers (2024-11-15T14:45:58Z) - Paint Bucket Colorization Using Anime Character Color Design Sheets [72.66788521378864]
We introduce inclusion matching, which allows the network to understand the relationships between segments.
Our network's training pipeline significantly improves performance in both colorization and consecutive frame colorization.
To support our network's training, we have developed a unique dataset named PaintBucket-Character.
arXiv Detail & Related papers (2024-10-25T09:33:27Z) - L-C4: Language-Based Video Colorization for Creative and Consistent Color [59.069498113050436]
We present Language-based video colorization for Creative and Consistent Colors (L-C4)
Our model is built upon a pre-trained cross-modality generative model.
We propose temporally deformable attention to prevent flickering or color shifts, and cross-clip fusion to maintain long-term color consistency.
arXiv Detail & Related papers (2024-10-07T12:16:21Z) - Control Color: Multimodal Diffusion-based Interactive Image Colorization [81.68817300796644]
Control Color (Ctrl Color) is a multi-modal colorization method that leverages the pre-trained Stable Diffusion (SD) model.
We present an effective way to encode user strokes to enable precise local color manipulation.
We also introduce a novel module based on self-attention and a content-guided deformable autoencoder to address the long-standing issues of color overflow and inaccurate coloring.
arXiv Detail & Related papers (2024-02-16T17:51:13Z) - Diffusing Colors: Image Colorization with Text Guided Diffusion [11.727899027933466]
We present a novel image colorization framework that utilizes image diffusion techniques with granular text prompts.
Our method provides a balance between automation and control, outperforming existing techniques in terms of visual quality and semantic coherence.
Our approach holds potential particularly for color enhancement and historical image colorization.
arXiv Detail & Related papers (2023-12-07T08:59:20Z) - DiffColor: Toward High Fidelity Text-Guided Image Colorization with
Diffusion Models [12.897939032560537]
We propose a new method called DiffColor to recover vivid colors conditioned on a prompt text.
We first fine-tune a pre-trained text-to-image model to generate colorized images using a CLIP-based contrastive loss.
Then we try to obtain an optimized text embedding aligning the colorized image and the text prompt, and a fine-tuned diffusion model enabling high-quality image reconstruction.
Our method can produce vivid and diverse colors with a few iterations, and keep the structure and background intact while having colors well-aligned with the target language guidance.
arXiv Detail & Related papers (2023-08-03T09:38:35Z) - L-CAD: Language-based Colorization with Any-level Descriptions using
Diffusion Priors [62.80068955192816]
We propose a unified model to perform language-based colorization with any-level descriptions.
We leverage the pretrained cross-modality generative model for its robust language understanding and rich color priors.
With the proposed novel sampling strategy, our model achieves instance-aware colorization in diverse and complex scenarios.
arXiv Detail & Related papers (2023-05-24T14:57:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.