DiffColor: Toward High Fidelity Text-Guided Image Colorization with
Diffusion Models
- URL: http://arxiv.org/abs/2308.01655v1
- Date: Thu, 3 Aug 2023 09:38:35 GMT
- Title: DiffColor: Toward High Fidelity Text-Guided Image Colorization with
Diffusion Models
- Authors: Jianxin Lin, Peng Xiao, Yijun Wang, Rongju Zhang, Xiangxiang Zeng
- Abstract summary: We propose a new method called DiffColor to recover vivid colors conditioned on a prompt text.
We first fine-tune a pre-trained text-to-image model to generate colorized images using a CLIP-based contrastive loss.
Then we try to obtain an optimized text embedding aligning the colorized image and the text prompt, and a fine-tuned diffusion model enabling high-quality image reconstruction.
Our method can produce vivid and diverse colors with a few iterations, and keep the structure and background intact while having colors well-aligned with the target language guidance.
- Score: 12.897939032560537
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent data-driven image colorization methods have enabled automatic or
reference-based colorization, while still suffering from unsatisfactory and
inaccurate object-level color control. To address these issues, we propose a
new method called DiffColor that leverages the power of pre-trained diffusion
models to recover vivid colors conditioned on a prompt text, without any
additional inputs. DiffColor mainly contains two stages: colorization with
generative color prior and in-context controllable colorization. Specifically,
we first fine-tune a pre-trained text-to-image model to generate colorized
images using a CLIP-based contrastive loss. Then we try to obtain an optimized
text embedding aligning the colorized image and the text prompt, and a
fine-tuned diffusion model enabling high-quality image reconstruction. Our
method can produce vivid and diverse colors with a few iterations, and keep the
structure and background intact while having colors well-aligned with the
target language guidance. Moreover, our method allows for in-context
colorization, i.e., producing different colorization results by modifying
prompt texts without any fine-tuning, and can achieve object-level controllable
colorization results. Extensive experiments and user studies demonstrate that
DiffColor outperforms previous works in terms of visual quality, color
fidelity, and diversity of colorization options.
Related papers
- Paint Bucket Colorization Using Anime Character Color Design Sheets [72.66788521378864]
We introduce inclusion matching, which allows the network to understand the relationships between segments.
Our network's training pipeline significantly improves performance in both colorization and consecutive frame colorization.
To support our network's training, we have developed a unique dataset named PaintBucket-Character.
arXiv Detail & Related papers (2024-10-25T09:33:27Z) - L-C4: Language-Based Video Colorization for Creative and Consistent Color [59.069498113050436]
We present Language-based video colorization for Creative and Consistent Colors (L-C4)
Our model is built upon a pre-trained cross-modality generative model.
We propose temporally deformable attention to prevent flickering or color shifts, and cross-clip fusion to maintain long-term color consistency.
arXiv Detail & Related papers (2024-10-07T12:16:21Z) - Automatic Controllable Colorization via Imagination [55.489416987587305]
We propose a framework for automatic colorization that allows for iterative editing and modifications.
By understanding the content within a grayscale image, we utilize a pre-trained image generation model to generate multiple images that contain the same content.
These images serve as references for coloring, mimicking the process of human experts.
arXiv Detail & Related papers (2024-04-08T16:46:07Z) - Control Color: Multimodal Diffusion-based Interactive Image Colorization [81.68817300796644]
Control Color (Ctrl Color) is a multi-modal colorization method that leverages the pre-trained Stable Diffusion (SD) model.
We present an effective way to encode user strokes to enable precise local color manipulation.
We also introduce a novel module based on self-attention and a content-guided deformable autoencoder to address the long-standing issues of color overflow and inaccurate coloring.
arXiv Detail & Related papers (2024-02-16T17:51:13Z) - Diffusing Colors: Image Colorization with Text Guided Diffusion [11.727899027933466]
We present a novel image colorization framework that utilizes image diffusion techniques with granular text prompts.
Our method provides a balance between automation and control, outperforming existing techniques in terms of visual quality and semantic coherence.
Our approach holds potential particularly for color enhancement and historical image colorization.
arXiv Detail & Related papers (2023-12-07T08:59:20Z) - L-CAD: Language-based Colorization with Any-level Descriptions using
Diffusion Priors [62.80068955192816]
We propose a unified model to perform language-based colorization with any-level descriptions.
We leverage the pretrained cross-modality generative model for its robust language understanding and rich color priors.
With the proposed novel sampling strategy, our model achieves instance-aware colorization in diverse and complex scenarios.
arXiv Detail & Related papers (2023-05-24T14:57:42Z) - MMC: Multi-Modal Colorization of Images using Textual Descriptions [22.666387184216678]
We propose a deep network that takes two inputs (grayscale image and the respective encoded text description) and tries to predict the relevant color components.
Also, we have predicted each object in the image and have colorized them with their individual description to incorporate their specific attributes in the colorization process.
In terms of performance, the proposed method outperforms existing colorization techniques in terms of LPIPS, PSNR and SSIM metrics.
arXiv Detail & Related papers (2023-04-24T10:53:13Z) - Improved Diffusion-based Image Colorization via Piggybacked Models [19.807766482434563]
We introduce a colorization model piggybacking on the existing powerful T2I diffusion model.
A diffusion guider is designed to incorporate the pre-trained weights of the latent diffusion model.
A lightness-aware VQVAE will then generate the colorized result with pixel-perfect alignment to the given grayscale image.
arXiv Detail & Related papers (2023-04-21T16:23:24Z) - TIC: Text-Guided Image Colorization [24.317541784957285]
We propose a novel deep network that takes two inputs (the grayscale image and the respective encoded text description) and tries to predict the relevant color gamut.
As the respective textual descriptions contain color information of the objects present in the scene, the text encoding helps to improve the overall quality of the predicted colors.
We have evaluated our proposed model using different metrics and found that it outperforms the state-of-the-art colorization algorithms both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-08-04T18:40:20Z) - Image Colorization: A Survey and Dataset [94.59768013860668]
This article presents a comprehensive survey of state-of-the-art deep learning-based image colorization techniques.
It categorizes the existing colorization techniques into seven classes and discusses important factors governing their performance.
We perform an extensive experimental evaluation of existing image colorization methods using both existing datasets and our proposed one.
arXiv Detail & Related papers (2020-08-25T01:22:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.