Related papers: Diffusing Colors: Image Colorization with Text Guided Diffusion

Diffusing Colors: Image Colorization with Text Guided Diffusion

URL: http://arxiv.org/abs/2312.04145v1
Date: Thu, 7 Dec 2023 08:59:20 GMT
Title: Diffusing Colors: Image Colorization with Text Guided Diffusion
Authors: Nir Zabari, Aharon Azulay, Alexey Gorkor, Tavi Halperin, Ohad Fried
Abstract summary: We present a novel image colorization framework that utilizes image diffusion techniques with granular text prompts. Our method provides a balance between automation and control, outperforming existing techniques in terms of visual quality and semantic coherence. Our approach holds potential particularly for color enhancement and historical image colorization.
Score: 11.727899027933466
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The colorization of grayscale images is a complex and subjective task with significant challenges. Despite recent progress in employing large-scale datasets with deep neural networks, difficulties with controllability and visual quality persist. To tackle these issues, we present a novel image colorization framework that utilizes image diffusion techniques with granular text prompts. This integration not only produces colorization outputs that are semantically appropriate but also greatly improves the level of control users have over the colorization process. Our method provides a balance between automation and control, outperforming existing techniques in terms of visual quality and semantic coherence. We leverage a pretrained generative Diffusion Model, and show that we can finetune it for the colorization task without losing its generative power or attention to text prompts. Moreover, we present a novel CLIP-based ranking model that evaluates color vividness, enabling automatic selection of the most suitable level of vividness based on the specific scene semantics. Our approach holds potential particularly for color enhancement and historical image colorization.

Related papers

Free-Lunch Color-Texture Disentanglement for Stylized Image Generation [58.406368812760256]
This paper introduces the first tuning-free approach to achieve free-lunch color-texture disentanglement in stylized T2I generation. We develop techniques for separating and extracting Color-Texture Embeddings (CTE) from individual color and texture reference images. To ensure that the color palette of the generated image aligns closely with the color reference, we apply a whitening and coloring transformation.
arXiv Detail & Related papers (2025-03-18T14:10:43Z)
Leveraging Semantic Attribute Binding for Free-Lunch Color Control in Diffusion Models [53.73253164099701]
We introduce ColorWave, a training-free approach that achieves exact RGB-level color control in diffusion models without fine-tuning. We demonstrate that ColorWave establishes a new paradigm for structured, color-consistent diffusion-based image synthesis.
arXiv Detail & Related papers (2025-03-12T21:49:52Z)
MangaNinja: Line Art Colorization with Precise Reference Following [84.2001766692797]
MangaNinjia specializes in the task of reference-guided line art colorization. We incorporate two thoughtful designs to ensure precise character detail transcription. A patch shuffling module to facilitate correspondence learning between the reference color image and the target line art, and a point-driven control scheme to enable fine-grained color matching.
arXiv Detail & Related papers (2025-01-14T18:59:55Z)
ColorFlow: Retrieval-Augmented Image Sequence Colorization [65.93834649502898]
We propose a three-stage diffusion-based framework tailored for image sequence colorization in industrial applications. Unlike existing methods that require per-ID finetuning or explicit ID embedding extraction, we propose a novel Retrieval Augmented Colorization pipeline. Our pipeline also features a dual-branch design: one branch for color identity extraction and the other for colorization.
arXiv Detail & Related papers (2024-12-16T14:32:49Z)
ColorEdit: Training-free Image-Guided Color editing with diffusion model [23.519884152019642]
Text-to-image (T2I) diffusion models have been adopted for image editing tasks, demonstrating remarkable efficacy. However, due to attention leakage and collision between the cross-attention map of the object and the new color attribute from the text prompt, text-guided image editing methods may fail to change the color of an object. We propose a straightforward, yet stable, and effective image-guided method to modify the color of an object without requiring any additional fine-tuning or training.
arXiv Detail & Related papers (2024-11-15T14:45:58Z)
L-C4: Language-Based Video Colorization for Creative and Consistent Color [59.069498113050436]
We present Language-based video colorization for Creative and Consistent Colors (L-C4) Our model is built upon a pre-trained cross-modality generative model. We propose temporally deformable attention to prevent flickering or color shifts, and cross-clip fusion to maintain long-term color consistency.
arXiv Detail & Related papers (2024-10-07T12:16:21Z)
Transforming Color: A Novel Image Colorization Method [8.041659727964305]
This paper introduces a novel method for image colorization that utilizes a color transformer and generative adversarial networks (GANs) The proposed method integrates a transformer architecture to capture global information and a GAN framework to improve visual quality. Experimental results show that the proposed network significantly outperforms other state-of-the-art colorization techniques.
arXiv Detail & Related papers (2024-10-07T07:23:42Z)
Automatic Controllable Colorization via Imagination [55.489416987587305]
We propose a framework for automatic colorization that allows for iterative editing and modifications. By understanding the content within a grayscale image, we utilize a pre-trained image generation model to generate multiple images that contain the same content. These images serve as references for coloring, mimicking the process of human experts.
arXiv Detail & Related papers (2024-04-08T16:46:07Z)
Control Color: Multimodal Diffusion-based Interactive Image Colorization [81.68817300796644]
Control Color (Ctrl Color) is a multi-modal colorization method that leverages the pre-trained Stable Diffusion (SD) model. We present an effective way to encode user strokes to enable precise local color manipulation. We also introduce a novel module based on self-attention and a content-guided deformable autoencoder to address the long-standing issues of color overflow and inaccurate coloring.
arXiv Detail & Related papers (2024-02-16T17:51:13Z)
Audio-Infused Automatic Image Colorization by Exploiting Audio Scene Semantics [54.980359694044566]
This paper tries to utilize corresponding audio, which naturally contains extra semantic information about the same scene. Experiments demonstrate that audio guidance can effectively improve the performance of automatic colorization.
arXiv Detail & Related papers (2024-01-24T07:22:05Z)
DiffColor: Toward High Fidelity Text-Guided Image Colorization with Diffusion Models [12.897939032560537]
We propose a new method called DiffColor to recover vivid colors conditioned on a prompt text. We first fine-tune a pre-trained text-to-image model to generate colorized images using a CLIP-based contrastive loss. Then we try to obtain an optimized text embedding aligning the colorized image and the text prompt, and a fine-tuned diffusion model enabling high-quality image reconstruction. Our method can produce vivid and diverse colors with a few iterations, and keep the structure and background intact while having colors well-aligned with the target language guidance.
arXiv Detail & Related papers (2023-08-03T09:38:35Z)
Improved Diffusion-based Image Colorization via Piggybacked Models [19.807766482434563]
We introduce a colorization model piggybacking on the existing powerful T2I diffusion model. A diffusion guider is designed to incorporate the pre-trained weights of the latent diffusion model. A lightness-aware VQVAE will then generate the colorized result with pixel-perfect alignment to the given grayscale image.
arXiv Detail & Related papers (2023-04-21T16:23:24Z)
PalGAN: Image Colorization with Palette Generative Adversarial Networks [51.59276436217957]
We propose a new GAN-based colorization approach PalGAN, integrated with palette estimation and chromatic attention. PalGAN outperforms state-of-the-arts in quantitative evaluation and visual comparison, delivering notable diverse, contrastive, and edge-preserving appearances.
arXiv Detail & Related papers (2022-10-20T12:28:31Z)
TIC: Text-Guided Image Colorization [24.317541784957285]
We propose a novel deep network that takes two inputs (the grayscale image and the respective encoded text description) and tries to predict the relevant color gamut. As the respective textual descriptions contain color information of the objects present in the scene, the text encoding helps to improve the overall quality of the predicted colors. We have evaluated our proposed model using different metrics and found that it outperforms the state-of-the-art colorization algorithms both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-08-04T18:40:20Z)
Image Colorization: A Survey and Dataset [94.59768013860668]
This article presents a comprehensive survey of state-of-the-art deep learning-based image colorization techniques. It categorizes the existing colorization techniques into seven classes and discusses important factors governing their performance. We perform an extensive experimental evaluation of existing image colorization methods using both existing datasets and our proposed one.
arXiv Detail & Related papers (2020-08-25T01:22:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.