Fine-Tuning InstructPix2Pix for Advanced Image Colorization
- URL: http://arxiv.org/abs/2312.04780v1
- Date: Fri, 8 Dec 2023 01:36:49 GMT
- Title: Fine-Tuning InstructPix2Pix for Advanced Image Colorization
- Authors: Zifeng An, Zijing Xu, Eric Fan, Qi Cao
- Abstract summary: This paper presents a novel approach to human image colorization by fine-tuning the InstructPix2Pix model.
We fine-tune the model using the IMDB-WIKI dataset, pairing black-and-white images with a diverse set of colorization prompts generated by ChatGPT.
After finetuning, our model outperforms the original InstructPix2Pix model on multiple metrics quantitatively.
- Score: 3.4975669723257035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a novel approach to human image colorization by
fine-tuning the InstructPix2Pix model, which integrates a language model
(GPT-3) with a text-to-image model (Stable Diffusion). Despite the original
InstructPix2Pix model's proficiency in editing images based on textual
instructions, it exhibits limitations in the focused domain of colorization. To
address this, we fine-tuned the model using the IMDB-WIKI dataset, pairing
black-and-white images with a diverse set of colorization prompts generated by
ChatGPT. This paper contributes by (1) applying fine-tuning techniques to
stable diffusion models specifically for colorization tasks, and (2) employing
generative models to create varied conditioning prompts. After finetuning, our
model outperforms the original InstructPix2Pix model on multiple metrics
quantitatively, and we produce more realistically colored images qualitatively.
The code for this project is provided on the GitHub Repository
https://github.com/AllenAnZifeng/DeepLearning282.
Related papers
- ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement [20.45850285936787]
We propose to learn specific color prompts tailored to user-selected colors.
Our method, denoted as ColorPeel, successfully assists the T2I models to peel off the novel color prompts.
Our findings represent a significant step towards improving precision and versatility of T2I models.
arXiv Detail & Related papers (2024-07-09T19:26:34Z) - Automatic Controllable Colorization via Imagination [55.489416987587305]
We propose a framework for automatic colorization that allows for iterative editing and modifications.
By understanding the content within a grayscale image, we utilize a pre-trained image generation model to generate multiple images that contain the same content.
These images serve as references for coloring, mimicking the process of human experts.
arXiv Detail & Related papers (2024-04-08T16:46:07Z) - Direct Consistency Optimization for Compositional Text-to-Image
Personalization [73.94505688626651]
Text-to-image (T2I) diffusion models, when fine-tuned on a few personal images, are able to generate visuals with a high degree of consistency.
We propose to fine-tune the T2I model by maximizing consistency to reference images, while penalizing the deviation from the pretrained model.
arXiv Detail & Related papers (2024-02-19T09:52:41Z) - ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text [5.675944597452309]
We introduce two variations of an image-guided latent diffusion model utilizing different image tokens from the pre-trained CLIP image encoder.
We propose corresponding manipulation methods to adjust their results sequentially using weighted text inputs.
arXiv Detail & Related papers (2024-01-02T22:46:12Z) - Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion [50.59261592343479]
We present Kandinsky1, a novel exploration of latent diffusion architecture.
The proposed model is trained separately to map text embeddings to image embeddings of CLIP.
We also deployed a user-friendly demo system that supports diverse generative modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpainting/outpainting.
arXiv Detail & Related papers (2023-10-05T12:29:41Z) - Language-based Photo Color Adjustment for Graphic Designs [38.43984897069872]
We introduce an interactive language-based approach for photo recoloring.
Our model can predict the source colors and the target regions, and then recolor the target regions with the source colors based on the given language-based instruction.
arXiv Detail & Related papers (2023-08-06T08:53:49Z) - BLIP-Diffusion: Pre-trained Subject Representation for Controllable
Text-to-Image Generation and Editing [73.74570290836152]
BLIP-Diffusion is a new subject-driven image generation model that supports multimodal control.
Unlike other subject-driven generation models, BLIP-Diffusion introduces a new multimodal encoder which is pre-trained to provide subject representation.
arXiv Detail & Related papers (2023-05-24T04:51:04Z) - LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image
Diffusion Models with Large Language Models [62.75006608940132]
This work proposes to enhance prompt understanding capabilities in text-to-image diffusion models.
Our method leverages a pretrained large language model for grounded generation in a novel two-stage process.
Our method significantly outperforms the base diffusion model and several strong baselines in accurately generating images.
arXiv Detail & Related papers (2023-05-23T03:59:06Z) - Improved Diffusion-based Image Colorization via Piggybacked Models [19.807766482434563]
We introduce a colorization model piggybacking on the existing powerful T2I diffusion model.
A diffusion guider is designed to incorporate the pre-trained weights of the latent diffusion model.
A lightness-aware VQVAE will then generate the colorized result with pixel-perfect alignment to the given grayscale image.
arXiv Detail & Related papers (2023-04-21T16:23:24Z) - InstructPix2Pix: Learning to Follow Image Editing Instructions [103.77092910685764]
We propose a method for editing images from human instructions.
given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image.
We show compelling editing results for a diverse collection of input images and written instructions.
arXiv Detail & Related papers (2022-11-17T18:58:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.