Training-free Color-Style Disentanglement for Constrained Text-to-Image Synthesis
- URL: http://arxiv.org/abs/2409.02429v1
- Date: Wed, 4 Sep 2024 04:16:58 GMT
- Title: Training-free Color-Style Disentanglement for Constrained Text-to-Image Synthesis
- Authors: Aishwarya Agarwal, Srikrishna Karanam, Balaji Vasan Srinivasan,
- Abstract summary: We present the first training-free, test-time-only method to disentangle and condition text-to-image models on color and style attributes from reference image.
- Score: 16.634138745034733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of independently, in a disentangled fashion, controlling the outputs of text-to-image diffusion models with color and style attributes of a user-supplied reference image. We present the first training-free, test-time-only method to disentangle and condition text-to-image models on color and style attributes from reference image. To realize this, we propose two key innovations. Our first contribution is to transform the latent codes at inference time using feature transformations that make the covariance matrix of current generation follow that of the reference image, helping meaningfully transfer color. Next, we observe that there exists a natural disentanglement between color and style in the LAB image space, which we exploit to transform the self-attention feature maps of the image being generated with respect to those of the reference computed from its L channel. Both these operations happen purely at test time and can be done independently or merged. This results in a flexible method where color and style information can come from the same reference image or two different sources, and a new generation can seamlessly fuse them in either scenario.
Related papers
- ColorEdit: Training-free Image-Guided Color editing with diffusion model [23.519884152019642]
Text-to-image (T2I) diffusion models have been adopted for image editing tasks, demonstrating remarkable efficacy.
However, due to attention leakage and collision between the cross-attention map of the object and the new color attribute from the text prompt, text-guided image editing methods may fail to change the color of an object.
We propose a straightforward, yet stable, and effective image-guided method to modify the color of an object without requiring any additional fine-tuning or training.
arXiv Detail & Related papers (2024-11-15T14:45:58Z) - Automatic Controllable Colorization via Imagination [55.489416987587305]
We propose a framework for automatic colorization that allows for iterative editing and modifications.
By understanding the content within a grayscale image, we utilize a pre-trained image generation model to generate multiple images that contain the same content.
These images serve as references for coloring, mimicking the process of human experts.
arXiv Detail & Related papers (2024-04-08T16:46:07Z) - DEADiff: An Efficient Stylization Diffusion Model with Disentangled
Representations [64.43387739794531]
Current encoder-based approaches significantly impair the text controllability of text-to-image models while transferring styles.
We introduce DEADiff to address this issue using the following two strategies.
DEAiff attains the best visual stylization results and optimal balance between the text controllability inherent in the text-to-image model and style similarity to the reference image.
arXiv Detail & Related papers (2024-03-11T17:35:23Z) - Control Color: Multimodal Diffusion-based Interactive Image Colorization [81.68817300796644]
Control Color (Ctrl Color) is a multi-modal colorization method that leverages the pre-trained Stable Diffusion (SD) model.
We present an effective way to encode user strokes to enable precise local color manipulation.
We also introduce a novel module based on self-attention and a content-guided deformable autoencoder to address the long-standing issues of color overflow and inaccurate coloring.
arXiv Detail & Related papers (2024-02-16T17:51:13Z) - DiffColor: Toward High Fidelity Text-Guided Image Colorization with
Diffusion Models [12.897939032560537]
We propose a new method called DiffColor to recover vivid colors conditioned on a prompt text.
We first fine-tune a pre-trained text-to-image model to generate colorized images using a CLIP-based contrastive loss.
Then we try to obtain an optimized text embedding aligning the colorized image and the text prompt, and a fine-tuned diffusion model enabling high-quality image reconstruction.
Our method can produce vivid and diverse colors with a few iterations, and keep the structure and background intact while having colors well-aligned with the target language guidance.
arXiv Detail & Related papers (2023-08-03T09:38:35Z) - Dequantization and Color Transfer with Diffusion Models [5.228564799458042]
quantized images offer easy abstraction for patch-based edits and palette transfer.
We show that our model can generate natural images that respect the color palette the user asked for.
Our method can be usefully extended to another practical edit: recoloring patches of an image while respecting the source texture.
arXiv Detail & Related papers (2023-07-06T00:07:32Z) - Improved Diffusion-based Image Colorization via Piggybacked Models [19.807766482434563]
We introduce a colorization model piggybacking on the existing powerful T2I diffusion model.
A diffusion guider is designed to incorporate the pre-trained weights of the latent diffusion model.
A lightness-aware VQVAE will then generate the colorized result with pixel-perfect alignment to the given grayscale image.
arXiv Detail & Related papers (2023-04-21T16:23:24Z) - BiSTNet: Semantic Image Prior Guided Bidirectional Temporal Feature
Fusion for Deep Exemplar-based Video Colorization [70.14893481468525]
We present an effective BiSTNet to explore colors of reference exemplars and utilize them to help video colorization.
We first establish the semantic correspondence between each frame and the reference exemplars in deep feature space to explore color information from reference exemplars.
We develop a mixed expert block to extract semantic information for modeling the object boundaries of frames so that the semantic image prior can better guide the colorization process.
arXiv Detail & Related papers (2022-12-05T13:47:15Z) - Detecting Recolored Image by Spatial Correlation [60.08643417333974]
Image recoloring is an emerging editing technique that can manipulate the color values of an image to give it a new style.
In this paper, we explore a solution from the perspective of the spatial correlation, which exhibits the generic detection capability for both conventional and deep learning-based recoloring.
Our method achieves the state-of-the-art detection accuracy on multiple benchmark datasets and exhibits well generalization for unknown types of recoloring methods.
arXiv Detail & Related papers (2022-04-23T01:54:06Z) - Color2Style: Real-Time Exemplar-Based Image Colorization with
Self-Reference Learning and Deep Feature Modulation [29.270149925368674]
We present a deep exemplar-based image colorization approach named Color2Style to resurrect grayscale image media by filling them with vibrant colors.
Our method exploits a simple yet effective deep feature modulation (DFM) module, which injects the color embeddings extracted from the reference image into the deep representations of the input grayscale image.
arXiv Detail & Related papers (2021-06-15T10:05:58Z) - Deep Line Art Video Colorization with a Few References [49.7139016311314]
We propose a deep architecture to automatically color line art videos with the same color style as the given reference images.
Our framework consists of a color transform network and a temporal constraint network.
Our model can achieve even better coloring results by fine-tuning the parameters with only a small amount of samples.
arXiv Detail & Related papers (2020-03-24T06:57:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.