Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion
Models
- URL: http://arxiv.org/abs/2305.04441v1
- Date: Mon, 8 May 2023 03:34:33 GMT
- Title: Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion
Models
- Authors: Wenkai Dong, Song Xue, Xiaoyue Duan, Shumin Han
- Abstract summary: We propose an accurate and quick inversion technique, Prompt Tuning Inversion, for text-driven image editing.
Our proposed editing method consists of a reconstruction stage and an editing stage.
Experiments on ImageNet demonstrate the superior editing performance of our method compared to the state-of-the-art baselines.
- Score: 6.34777393532937
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently large-scale language-image models (e.g., text-guided diffusion
models) have considerably improved the image generation capabilities to
generate photorealistic images in various domains. Based on this success,
current image editing methods use texts to achieve intuitive and versatile
modification of images. To edit a real image using diffusion models, one must
first invert the image to a noisy latent from which an edited image is sampled
with a target text prompt. However, most methods lack one of the following:
user-friendliness (e.g., additional masks or precise descriptions of the input
image are required), generalization to larger domains, or high fidelity to the
input image. In this paper, we design an accurate and quick inversion
technique, Prompt Tuning Inversion, for text-driven image editing.
Specifically, our proposed editing method consists of a reconstruction stage
and an editing stage. In the first stage, we encode the information of the
input image into a learnable conditional embedding via Prompt Tuning Inversion.
In the second stage, we apply classifier-free guidance to sample the edited
image, where the conditional embedding is calculated by linearly interpolating
between the target embedding and the optimized one obtained in the first stage.
This technique ensures a superior trade-off between editability and high
fidelity to the input image of our method. For example, we can change the color
of a specific object while preserving its original shape and background under
the guidance of only a target text prompt. Extensive experiments on ImageNet
demonstrate the superior editing performance of our method compared to the
state-of-the-art baselines.
Related papers
- PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models [80.98455219375862]
We present the first text-based image editing approach for object parts based on pre-trained diffusion models.
Our approach is preferred by users 77-90% of the time in conducted user studies.
arXiv Detail & Related papers (2025-02-06T13:08:43Z) - Textualize Visual Prompt for Image Editing via Diffusion Bridge [15.696208035498753]
Current visual prompt methods rely on a pretrained text-guided image-to-image generative model.
We present a framework based on any single text-to-image model without reliance on the explicit image-to-image model.
arXiv Detail & Related papers (2025-01-07T03:33:22Z) - TurboEdit: Instant text-based image editing [32.06820085957286]
We address the challenges of precise image inversion and disentangled image editing in the context of few-step diffusion models.
We introduce an encoder based iterative inversion technique. The inversion network is conditioned on the input image and the reconstructed image from the previous step, allowing for correction of the next reconstruction towards the input image.
Our approach facilitates realistic text-guided image edits in real-time, requiring only 8 number of functional evaluations (NFEs) in inversion and 4 NFEs per edit.
arXiv Detail & Related papers (2024-08-14T18:02:24Z) - Tuning-Free Image Customization with Image and Text Guidance [65.9504243633169]
We introduce a tuning-free framework for simultaneous text-image-guided image customization.
Our approach preserves the semantic features of the reference image subject while allowing modification of detailed attributes based on text descriptions.
Our approach outperforms previous methods in both human and quantitative evaluations.
arXiv Detail & Related papers (2024-03-19T11:48:35Z) - StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing [115.49488548588305]
A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.
They either finetune the model, or invert the image in the latent space of the pretrained model.
They suffer from two problems: Unsatisfying results for selected regions and unexpected changes in non-selected regions.
arXiv Detail & Related papers (2023-03-28T00:16:45Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z) - Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image
Inpainting [53.708523312636096]
We present Imagen Editor, a cascaded diffusion model built, by fine-tuning on text-guided image inpainting.
edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training.
To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting.
arXiv Detail & Related papers (2022-12-13T21:25:11Z) - Null-text Inversion for Editing Real Images using Guided Diffusion
Models [44.27570654402436]
We introduce an accurate inversion technique and thus facilitate an intuitive text-based modification of the image.
Our Null-text inversion, based on the publicly available Stable Diffusion model, is extensively evaluated on a variety of images and prompt editing.
arXiv Detail & Related papers (2022-11-17T18:58:14Z) - Direct Inversion: Optimization-Free Text-Driven Real Image Editing with
Diffusion Models [0.0]
We propose an optimization-free and zero fine-tuning framework that applies complex and non-rigid edits to a single real image via a text prompt.
We prove our method's efficacy in producing high-quality, diverse, semantically coherent, and faithful real image edits.
arXiv Detail & Related papers (2022-11-15T01:07:38Z) - DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing.
Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.