Instance-aware Image Colorization with Controllable Textual Descriptions and Segmentation Masks
- URL: http://arxiv.org/abs/2505.08705v2
- Date: Thu, 25 Sep 2025 07:25:28 GMT
- Title: Instance-aware Image Colorization with Controllable Textual Descriptions and Segmentation Masks
- Authors: Yanru An, Ling Gui, Chunlei Cai, Tianxiao Ye, JIangchao Yao, Guangtao Zhai, Qiang Hu, Xiaoyun Zhang,
- Abstract summary: Current mainstream image colorization models face issues such as color bleeding and color binding errors.<n>We propose a diffusion-based colorization method MT-Color to achieve precise instance-aware colorization with use-provided guidance.<n>We have created a specialized dataset for instance-level colorization tasks, GPT-color, by leveraging large visual language models on existing image datasets.
- Score: 60.495900243979754
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, the application of deep learning in image colorization has received widespread attention. The maturation of diffusion models has further advanced the development of image colorization models. However, current mainstream image colorization models still face issues such as color bleeding and color binding errors, and cannot colorize images at the instance level. In this paper, we propose a diffusion-based colorization method MT-Color to achieve precise instance-aware colorization with use-provided guidance. To tackle color bleeding issue, we design a pixel-level mask attention mechanism that integrates latent features and conditional gray image features through cross-attention. We use segmentation masks to construct cross-attention masks, preventing pixel information from exchanging between different instances. We also introduce an instance mask and text guidance module that extracts instance masks and text representations of each instance, which are then fused with latent features through self-attention, utilizing instance masks to form self-attention masks to prevent instance texts from guiding the colorization of other areas, thus mitigating color binding errors. Furthermore, we apply a multi-instance sampling strategy, which involves sampling each instance region separately and then fusing the results. Additionally, we have created a specialized dataset for instance-level colorization tasks, GPT-color, by leveraging large visual language models on existing image datasets. Qualitative and quantitative experiments show that our model and dataset outperform previous methods and datasets.
Related papers
- Content-Adaptive Image Retouching Guided by Attribute-Based Text Representation [53.196155487850746]
We propose a novel Content-Adaptive image retouching method guided by Attribute-based Text Representation (CA-ATP)<n> Specifically, we propose a content-adaptive curve mapping module, which leverages a series of basis curves to establish multiple color mapping relationships.<n>In addition, we propose an attribute text prediction module that generates text representations from multiple image attributes, which explicitly represent user-defined style preferences.
arXiv Detail & Related papers (2025-12-10T12:15:50Z) - MagicColor: Multi-Instance Sketch Colorization [44.72374445094054]
MagicColor is a diffusion-based framework for multi-instance sketch colorization.<n>Our model critically automates the colorization process with zero manual adjustments.
arXiv Detail & Related papers (2025-03-21T08:53:14Z) - Leveraging Semantic Attribute Binding for Free-Lunch Color Control in Diffusion Models [53.73253164099701]
We introduce ColorWave, a training-free approach that achieves exact RGB-level color control in diffusion models without fine-tuning.<n>We demonstrate that ColorWave establishes a new paradigm for structured, color-consistent diffusion-based image synthesis.
arXiv Detail & Related papers (2025-03-12T21:49:52Z) - Image Referenced Sketch Colorization Based on Animation Creation Workflow [28.281739343084993]
We propose a diffusion-based framework inspired by real-world animation production.<n>Our approach leverages the sketch as the spatial guidance and an RGB image as the color reference, and separately extracts foreground and background from the reference image with masks.<n>This design allows the diffusion model to integrate information from foreground and background independently, preventing interference and eliminating the spatial artifacts.
arXiv Detail & Related papers (2025-02-27T10:04:47Z) - ColorFlow: Retrieval-Augmented Image Sequence Colorization [65.93834649502898]
We propose a three-stage diffusion-based framework tailored for image sequence colorization in industrial applications.<n>Unlike existing methods that require per-ID finetuning or explicit ID embedding extraction, we propose a novel Retrieval Augmented Colorization pipeline.<n>Our pipeline also features a dual-branch design: one branch for color identity extraction and the other for colorization.
arXiv Detail & Related papers (2024-12-16T14:32:49Z) - ColorEdit: Training-free Image-Guided Color editing with diffusion model [23.519884152019642]
Text-to-image (T2I) diffusion models have been adopted for image editing tasks, demonstrating remarkable efficacy.<n>However, due to attention leakage and collision between the cross-attention map of the object and the new color attribute from the text prompt, text-guided image editing methods may fail to change the color of an object.<n>We propose a straightforward, yet stable, and effective image-guided method to modify the color of an object without requiring any additional fine-tuning or training.
arXiv Detail & Related papers (2024-11-15T14:45:58Z) - Outline-Guided Object Inpainting with Diffusion Models [11.391452115311798]
Instance segmentation datasets play a crucial role in training accurate and robust computer vision models.
We show how this issue can be mitigated by starting with small annotated instance segmentation datasets and augmenting them to obtain a sizeable annotated dataset.
We generate new images using a diffusion-based inpainting model to fill out the masked area with a desired object class by guiding the diffusion through the object outline.
arXiv Detail & Related papers (2024-02-26T09:21:17Z) - Control Color: Multimodal Diffusion-based Interactive Image Colorization [81.68817300796644]
Control Color (Ctrl Color) is a multi-modal colorization method that leverages the pre-trained Stable Diffusion (SD) model.
We present an effective way to encode user strokes to enable precise local color manipulation.
We also introduce a novel module based on self-attention and a content-guided deformable autoencoder to address the long-standing issues of color overflow and inaccurate coloring.
arXiv Detail & Related papers (2024-02-16T17:51:13Z) - InstanceDiffusion: Instance-level Control for Image Generation [89.31908006870422]
InstanceDiffusion adds precise instance-level control to text-to-image diffusion models.
We propose three major changes to text-to-image models that enable precise instance-level control.
arXiv Detail & Related papers (2024-02-05T18:49:17Z) - UniGS: Unified Representation for Image Generation and Segmentation [105.08152635402858]
We use a colormap to represent entity-level masks, addressing the challenge of varying entity numbers.
Two novel modules, including the location-aware color palette and progressive dichotomy module, are proposed to support our mask representation.
arXiv Detail & Related papers (2023-12-04T15:59:27Z) - Instance-aware Image Colorization [51.12040118366072]
In this paper, we propose a method for achieving instance-aware colorization.
Our network architecture leverages an off-the-shelf object detector to obtain cropped object images.
We use a similar network to extract the full-image features and apply a fusion module to predict the final colors.
arXiv Detail & Related papers (2020-05-21T17:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.