EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods
- URL: http://arxiv.org/abs/2310.02426v1
- Date: Tue, 3 Oct 2023 20:46:10 GMT
- Title: EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods
- Authors: Samyadeep Basu, Mehrdad Saberi, Shweta Bhardwaj, Atoosa Malemir
Chegini, Daniela Massiceti, Maziar Sanjabi, Shell Xu Hu, Soheil Feizi
- Abstract summary: We introduce EditVal, a standardized benchmark for quantitatively evaluating text-guided image editing methods.
EditVal consists of a curated dataset of images, a set of editable attributes for each image drawn from 13 possible edit types, and an automated evaluation pipeline.
We use EditVal to benchmark 8 cutting-edge diffusion-based editing methods including SINE, Imagic and Instruct-Pix2Pix.
- Score: 52.43439659492655
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A plethora of text-guided image editing methods have recently been developed
by leveraging the impressive capabilities of large-scale diffusion-based
generative models such as Imagen and Stable Diffusion. A standardized
evaluation protocol, however, does not exist to compare methods across
different types of fine-grained edits. To address this gap, we introduce
EditVal, a standardized benchmark for quantitatively evaluating text-guided
image editing methods. EditVal consists of a curated dataset of images, a set
of editable attributes for each image drawn from 13 possible edit types, and an
automated evaluation pipeline that uses pre-trained vision-language models to
assess the fidelity of generated images for each edit type. We use EditVal to
benchmark 8 cutting-edge diffusion-based editing methods including SINE, Imagic
and Instruct-Pix2Pix. We complement this with a large-scale human study where
we show that EditVall's automated evaluation pipeline is strongly correlated
with human-preferences for the edit types we considered. From both the human
study and automated evaluation, we find that: (i) Instruct-Pix2Pix, Null-Text
and SINE are the top-performing methods averaged across different edit types,
however {\it only} Instruct-Pix2Pix and Null-Text are able to preserve original
image properties; (ii) Most of the editing methods fail at edits involving
spatial operations (e.g., changing the position of an object). (iii) There is
no `winner' method which ranks the best individually across a range of
different edit types. We hope that our benchmark can pave the way to developing
more reliable text-guided image editing tools in the future. We will publicly
release EditVal, and all associated code and human-study templates to support
these research directions in https://deep-ml-research.github.io/editval/.
Related papers
- PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models [80.98455219375862]
We present the first text-based image editing approach for object parts based on pre-trained diffusion models.
Our approach is preferred by users 77-90% of the time in conducted user studies.
arXiv Detail & Related papers (2025-02-06T13:08:43Z) - TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models [53.757752110493215]
We focus on a popular line of text-based editing frameworks - the edit-friendly'' DDPM-noise inversion approach.
We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength.
We propose a pseudo-guidance approach that efficiently increases the magnitude of edits without introducing new artifacts.
arXiv Detail & Related papers (2024-08-01T17:27:28Z) - Edit One for All: Interactive Batch Image Editing [44.50631647670942]
This paper presents a novel method for interactive batch image editing using StyleGAN as the medium.
Given an edit specified by users in an example image (e.g., make the face frontal), our method can automatically transfer that edit to other test images.
Experiments demonstrate that edits performed using our method have similar visual quality to existing single-image-editing methods.
arXiv Detail & Related papers (2024-01-18T18:58:44Z) - Object-aware Inversion and Reassembly for Image Editing [61.19822563737121]
We propose Object-aware Inversion and Reassembly (OIR) to enable object-level fine-grained editing.
We use our search metric to find the optimal inversion step for each editing pair when editing an image.
Our method achieves superior performance in editing object shapes, colors, materials, categories, etc., especially in multi-object editing scenarios.
arXiv Detail & Related papers (2023-10-18T17:59:02Z) - Forgedit: Text Guided Image Editing via Learning and Forgetting [17.26772361532044]
We design a novel text-guided image editing method, named as Forgedit.
First, we propose a vision-language joint optimization framework capable of reconstructing the original image in 30 seconds.
Then, we propose a novel vector projection mechanism in text embedding space of Diffusion Models.
arXiv Detail & Related papers (2023-09-19T12:05:26Z) - Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion
Models [6.34777393532937]
We propose an accurate and quick inversion technique, Prompt Tuning Inversion, for text-driven image editing.
Our proposed editing method consists of a reconstruction stage and an editing stage.
Experiments on ImageNet demonstrate the superior editing performance of our method compared to the state-of-the-art baselines.
arXiv Detail & Related papers (2023-05-08T03:34:33Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z) - Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image
Inpainting [53.708523312636096]
We present Imagen Editor, a cascaded diffusion model built, by fine-tuning on text-guided image inpainting.
edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training.
To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting.
arXiv Detail & Related papers (2022-12-13T21:25:11Z) - Null-text Inversion for Editing Real Images using Guided Diffusion
Models [44.27570654402436]
We introduce an accurate inversion technique and thus facilitate an intuitive text-based modification of the image.
Our Null-text inversion, based on the publicly available Stable Diffusion model, is extensively evaluated on a variety of images and prompt editing.
arXiv Detail & Related papers (2022-11-17T18:58:14Z) - UniTune: Text-Driven Image Editing by Fine Tuning a Diffusion Model on a
Single Image [2.999198565272416]
We make the observation that image-generation models can be converted to image-editing models simply by fine-tuning them on a single image.
We propose UniTune, a novel image editing method. UniTune gets as input an arbitrary image and a textual edit description, and carries out the edit while maintaining high fidelity to the input image.
We demonstrate that it is broadly applicable and can perform a surprisingly wide range of expressive editing operations, including those requiring significant visual changes that were previously impossible.
arXiv Detail & Related papers (2022-10-17T23:46:05Z) - EditGAN: High-Precision Semantic Image Editing [120.49401527771067]
EditGAN is a novel method for high quality, high precision semantic image editing.
We show that EditGAN can manipulate images with an unprecedented level of detail and freedom.
We can also easily combine multiple edits and perform plausible edits beyond EditGAN training data.
arXiv Detail & Related papers (2021-11-04T22:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.