ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text
- URL: http://arxiv.org/abs/2401.01456v3
- Date: Wed, 3 Jul 2024 04:18:49 GMT
- Title: ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text
- Authors: Dingkun Yan, Liang Yuan, Erwin Wu, Yuma Nishioka, Issei Fujishiro, Suguru Saito,
- Abstract summary: We introduce two variations of an image-guided latent diffusion model utilizing different image tokens from the pre-trained CLIP image encoder.
We propose corresponding manipulation methods to adjust their results sequentially using weighted text inputs.
- Score: 5.675944597452309
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Diffusion models have recently demonstrated their effectiveness in generating extremely high-quality images and are now utilized in a wide range of applications, including automatic sketch colorization. Although many methods have been developed for guided sketch colorization, there has been limited exploration of the potential conflicts between image prompts and sketch inputs, which can lead to severe deterioration in the results. Therefore, this paper exhaustively investigates reference-based sketch colorization models that aim to colorize sketch images using reference color images. We specifically investigate two critical aspects of reference-based diffusion models: the "distribution problem", which is a major shortcoming compared to text-based counterparts, and the capability in zero-shot sequential text-based manipulation. We introduce two variations of an image-guided latent diffusion model utilizing different image tokens from the pre-trained CLIP image encoder and propose corresponding manipulation methods to adjust their results sequentially using weighted text inputs. We conduct comprehensive evaluations of our models through qualitative and quantitative experiments as well as a user study.
Related papers
- Test-time Conditional Text-to-Image Synthesis Using Diffusion Models [15.24270990274781]
TINTIN: Test-time Conditional Text-to-Image Synthesis using Diffusion Models is a new training-free test-time only algorithm.
We demonstrate significant improvement over the current state-of-the-art, both qualitatively and quantitatively.
arXiv Detail & Related papers (2024-11-16T13:32:18Z) - Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling [49.41822427811098]
We present Kaleido, a novel approach that enhances the diversity of samples by incorporating autoregressive latent priors.
Kaleido integrates an autoregressive language model that encodes the original caption and generates latent variables.
We show that Kaleido adheres closely to the guidance provided by the generated latent variables, demonstrating its capability to effectively control and direct the image generation process.
arXiv Detail & Related papers (2024-05-31T17:41:11Z) - Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers [120.49126407479717]
This paper explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR)
We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos.
arXiv Detail & Related papers (2024-03-12T00:02:03Z) - Control Color: Multimodal Diffusion-based Interactive Image Colorization [81.68817300796644]
Control Color (Ctrl Color) is a multi-modal colorization method that leverages the pre-trained Stable Diffusion (SD) model.
We present an effective way to encode user strokes to enable precise local color manipulation.
We also introduce a novel module based on self-attention and a content-guided deformable autoencoder to address the long-standing issues of color overflow and inaccurate coloring.
arXiv Detail & Related papers (2024-02-16T17:51:13Z) - Diffusing Colors: Image Colorization with Text Guided Diffusion [11.727899027933466]
We present a novel image colorization framework that utilizes image diffusion techniques with granular text prompts.
Our method provides a balance between automation and control, outperforming existing techniques in terms of visual quality and semantic coherence.
Our approach holds potential particularly for color enhancement and historical image colorization.
arXiv Detail & Related papers (2023-12-07T08:59:20Z) - DiffColor: Toward High Fidelity Text-Guided Image Colorization with
Diffusion Models [12.897939032560537]
We propose a new method called DiffColor to recover vivid colors conditioned on a prompt text.
We first fine-tune a pre-trained text-to-image model to generate colorized images using a CLIP-based contrastive loss.
Then we try to obtain an optimized text embedding aligning the colorized image and the text prompt, and a fine-tuned diffusion model enabling high-quality image reconstruction.
Our method can produce vivid and diverse colors with a few iterations, and keep the structure and background intact while having colors well-aligned with the target language guidance.
arXiv Detail & Related papers (2023-08-03T09:38:35Z) - Improved Diffusion-based Image Colorization via Piggybacked Models [19.807766482434563]
We introduce a colorization model piggybacking on the existing powerful T2I diffusion model.
A diffusion guider is designed to incorporate the pre-trained weights of the latent diffusion model.
A lightness-aware VQVAE will then generate the colorized result with pixel-perfect alignment to the given grayscale image.
arXiv Detail & Related papers (2023-04-21T16:23:24Z) - Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.
We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z) - PalGAN: Image Colorization with Palette Generative Adversarial Networks [51.59276436217957]
We propose a new GAN-based colorization approach PalGAN, integrated with palette estimation and chromatic attention.
PalGAN outperforms state-of-the-arts in quantitative evaluation and visual comparison, delivering notable diverse, contrastive, and edge-preserving appearances.
arXiv Detail & Related papers (2022-10-20T12:28:31Z) - Detecting Recolored Image by Spatial Correlation [60.08643417333974]
Image recoloring is an emerging editing technique that can manipulate the color values of an image to give it a new style.
In this paper, we explore a solution from the perspective of the spatial correlation, which exhibits the generic detection capability for both conventional and deep learning-based recoloring.
Our method achieves the state-of-the-art detection accuracy on multiple benchmark datasets and exhibits well generalization for unknown types of recoloring methods.
arXiv Detail & Related papers (2022-04-23T01:54:06Z) - Generative Probabilistic Image Colorization [2.110198946293069]
We propose a diffusion-based generative process that trains a sequence of probabilistic models to reverse each step of noise corruption.
Given a line-drawing image as input, our method suggests multiple candidate colorized images.
Our proposed approach performed well not only on color-conditional image generation tasks, but also on some practical image completion and inpainting tasks.
arXiv Detail & Related papers (2021-09-29T16:10:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.