Related papers: Training-free Color-Style Disentanglement for Constrained Text-to-Image Synthesis

Training-free Color-Style Disentanglement for Constrained Text-to-Image Synthesis

URL: http://arxiv.org/abs/2409.02429v1
Date: Wed, 4 Sep 2024 04:16:58 GMT
Title: Training-free Color-Style Disentanglement for Constrained Text-to-Image Synthesis
Authors: Aishwarya Agarwal, Srikrishna Karanam, Balaji Vasan Srinivasan,
Abstract summary: We present the first training-free, test-time-only method to disentangle and condition text-to-image models on color and style attributes from reference image.
Score: 16.634138745034733
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of independently, in a disentangled fashion, controlling the outputs of text-to-image diffusion models with color and style attributes of a user-supplied reference image. We present the first training-free, test-time-only method to disentangle and condition text-to-image models on color and style attributes from reference image. To realize this, we propose two key innovations. Our first contribution is to transform the latent codes at inference time using feature transformations that make the covariance matrix of current generation follow that of the reference image, helping meaningfully transfer color. Next, we observe that there exists a natural disentanglement between color and style in the LAB image space, which we exploit to transform the self-attention feature maps of the image being generated with respect to those of the reference computed from its L channel. Both these operations happen purely at test time and can be done independently or merged. This results in a flexible method where color and style information can come from the same reference image or two different sources, and a new generation can seamlessly fuse them in either scenario.

Related papers

Leveraging the Powerful Attention of a Pre-trained Diffusion Model for Exemplar-based Image Colorization [4.233370898095789]
Exemplar-based image colorization aims to colorize a grayscale image using a reference color image.<n>We propose a novel, fine-tuning-free approach based on a pre-trained diffusion model.<n>Our experimental results demonstrate that our method outperforms existing techniques in terms of image quality and fidelity to the reference.
arXiv Detail & Related papers (2025-05-21T17:59:40Z)
Color Conditional Generation with Sliced Wasserstein Guidance [44.99833362998488]
SW-Guidance is a training-free approach for image generation conditioned on the color distribution of a reference image. Our method outperforms state-of-the-art techniques for color-conditional generation in terms of color similarity to the reference.
arXiv Detail & Related papers (2025-03-24T18:06:03Z)
Free-Lunch Color-Texture Disentanglement for Stylized Image Generation [58.406368812760256]
This paper introduces the first tuning-free approach to achieve free-lunch color-texture disentanglement in stylized T2I generation. We develop techniques for separating and extracting Color-Texture Embeddings (CTE) from individual color and texture reference images. To ensure that the color palette of the generated image aligns closely with the color reference, we apply a whitening and coloring transformation.
arXiv Detail & Related papers (2025-03-18T14:10:43Z)
Leveraging Semantic Attribute Binding for Free-Lunch Color Control in Diffusion Models [53.73253164099701]
We introduce ColorWave, a training-free approach that achieves exact RGB-level color control in diffusion models without fine-tuning. We demonstrate that ColorWave establishes a new paradigm for structured, color-consistent diffusion-based image synthesis.
arXiv Detail & Related papers (2025-03-12T21:49:52Z)
ColorEdit: Training-free Image-Guided Color editing with diffusion model [23.519884152019642]
Text-to-image (T2I) diffusion models have been adopted for image editing tasks, demonstrating remarkable efficacy. However, due to attention leakage and collision between the cross-attention map of the object and the new color attribute from the text prompt, text-guided image editing methods may fail to change the color of an object. We propose a straightforward, yet stable, and effective image-guided method to modify the color of an object without requiring any additional fine-tuning or training.
arXiv Detail & Related papers (2024-11-15T14:45:58Z)
Automatic Controllable Colorization via Imagination [55.489416987587305]
We propose a framework for automatic colorization that allows for iterative editing and modifications. By understanding the content within a grayscale image, we utilize a pre-trained image generation model to generate multiple images that contain the same content. These images serve as references for coloring, mimicking the process of human experts.
arXiv Detail & Related papers (2024-04-08T16:46:07Z)
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations [64.43387739794531]
Current encoder-based approaches significantly impair the text controllability of text-to-image models while transferring styles. We introduce DEADiff to address this issue using the following two strategies. DEAiff attains the best visual stylization results and optimal balance between the text controllability inherent in the text-to-image model and style similarity to the reference image.
arXiv Detail & Related papers (2024-03-11T17:35:23Z)
Control Color: Multimodal Diffusion-based Interactive Image Colorization [81.68817300796644]
Control Color (Ctrl Color) is a multi-modal colorization method that leverages the pre-trained Stable Diffusion (SD) model. We present an effective way to encode user strokes to enable precise local color manipulation. We also introduce a novel module based on self-attention and a content-guided deformable autoencoder to address the long-standing issues of color overflow and inaccurate coloring.
arXiv Detail & Related papers (2024-02-16T17:51:13Z)
DiffColor: Toward High Fidelity Text-Guided Image Colorization with Diffusion Models [12.897939032560537]
We propose a new method called DiffColor to recover vivid colors conditioned on a prompt text. We first fine-tune a pre-trained text-to-image model to generate colorized images using a CLIP-based contrastive loss. Then we try to obtain an optimized text embedding aligning the colorized image and the text prompt, and a fine-tuned diffusion model enabling high-quality image reconstruction. Our method can produce vivid and diverse colors with a few iterations, and keep the structure and background intact while having colors well-aligned with the target language guidance.
arXiv Detail & Related papers (2023-08-03T09:38:35Z)
Dequantization and Color Transfer with Diffusion Models [5.228564799458042]
quantized images offer easy abstraction for patch-based edits and palette transfer. We show that our model can generate natural images that respect the color palette the user asked for. Our method can be usefully extended to another practical edit: recoloring patches of an image while respecting the source texture.
arXiv Detail & Related papers (2023-07-06T00:07:32Z)
Improved Diffusion-based Image Colorization via Piggybacked Models [19.807766482434563]
We introduce a colorization model piggybacking on the existing powerful T2I diffusion model. A diffusion guider is designed to incorporate the pre-trained weights of the latent diffusion model. A lightness-aware VQVAE will then generate the colorized result with pixel-perfect alignment to the given grayscale image.
arXiv Detail & Related papers (2023-04-21T16:23:24Z)
BiSTNet: Semantic Image Prior Guided Bidirectional Temporal Feature Fusion for Deep Exemplar-based Video Colorization [70.14893481468525]
We present an effective BiSTNet to explore colors of reference exemplars and utilize them to help video colorization. We first establish the semantic correspondence between each frame and the reference exemplars in deep feature space to explore color information from reference exemplars. We develop a mixed expert block to extract semantic information for modeling the object boundaries of frames so that the semantic image prior can better guide the colorization process.
arXiv Detail & Related papers (2022-12-05T13:47:15Z)
Detecting Recolored Image by Spatial Correlation [60.08643417333974]
Image recoloring is an emerging editing technique that can manipulate the color values of an image to give it a new style. In this paper, we explore a solution from the perspective of the spatial correlation, which exhibits the generic detection capability for both conventional and deep learning-based recoloring. Our method achieves the state-of-the-art detection accuracy on multiple benchmark datasets and exhibits well generalization for unknown types of recoloring methods.
arXiv Detail & Related papers (2022-04-23T01:54:06Z)
Color2Style: Real-Time Exemplar-Based Image Colorization with Self-Reference Learning and Deep Feature Modulation [29.270149925368674]
We present a deep exemplar-based image colorization approach named Color2Style to resurrect grayscale image media by filling them with vibrant colors. Our method exploits a simple yet effective deep feature modulation (DFM) module, which injects the color embeddings extracted from the reference image into the deep representations of the input grayscale image.
arXiv Detail & Related papers (2021-06-15T10:05:58Z)
Deep Line Art Video Colorization with a Few References [49.7139016311314]
We propose a deep architecture to automatically color line art videos with the same color style as the given reference images. Our framework consists of a color transform network and a temporal constraint network. Our model can achieve even better coloring results by fine-tuning the parameters with only a small amount of samples.
arXiv Detail & Related papers (2020-03-24T06:57:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.