Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image
Inpainting
- URL: http://arxiv.org/abs/2212.06909v2
- Date: Wed, 12 Apr 2023 22:42:08 GMT
- Title: Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image
Inpainting
- Authors: Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai
Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu
Soricut, Jason Baldridge, Mohammad Norouzi, Peter Anderson, William Chan
- Abstract summary: We present Imagen Editor, a cascaded diffusion model built, by fine-tuning on text-guided image inpainting.
edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training.
To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting.
- Score: 53.708523312636096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-guided image editing can have a transformative impact in supporting
creative applications. A key challenge is to generate edits that are faithful
to input text prompts, while consistent with input images. We present Imagen
Editor, a cascaded diffusion model built, by fine-tuning Imagen on text-guided
image inpainting. Imagen Editor's edits are faithful to the text prompts, which
is accomplished by using object detectors to propose inpainting masks during
training. In addition, Imagen Editor captures fine details in the input image
by conditioning the cascaded pipeline on the original high resolution image. To
improve qualitative and quantitative evaluation, we introduce EditBench, a
systematic benchmark for text-guided image inpainting. EditBench evaluates
inpainting edits on natural and generated images exploring objects, attributes,
and scenes. Through extensive human evaluation on EditBench, we find that
object-masking during training leads to across-the-board improvements in
text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and
Stable Diffusion -- and, as a cohort, these models are better at
object-rendering than text-rendering, and handle material/color/size attributes
better than count/shape attributes.
Related papers
- ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models [11.830273909934688]
Modern Text-to-Image (T2I) Diffusion models have revolutionized image editing by enabling the generation of high-quality images.
We propose ReEdit, a modular and efficient end-to-end framework that captures edits in both text and image modalities.
Our results demonstrate that ReEdit consistently outperforms contemporary approaches both qualitatively and quantitatively.
arXiv Detail & Related papers (2024-11-06T15:19:24Z) - DM-Align: Leveraging the Power of Natural Language Instructions to Make Changes to Images [55.546024767130994]
We propose a novel model to enhance the text-based control of an image editor by explicitly reasoning about which parts of the image to alter or preserve.
It relies on word alignments between a description of the original source image and the instruction that reflects the needed updates, and the input image.
It is evaluated on a subset of the Bison dataset and a self-defined dataset dubbed Dream.
arXiv Detail & Related papers (2024-04-27T22:45:47Z) - TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts [119.84478647745658]
TIPEditor is a 3D scene editing framework that accepts both text and image prompts and a 3D bounding box to specify the editing region.
Experiments have demonstrated that TIP-Editor conducts accurate editing following the text and image prompts in the specified bounding box region.
arXiv Detail & Related papers (2024-01-26T12:57:05Z) - Visual Instruction Inversion: Image Editing via Visual Prompting [34.96778567507126]
We present a method for image editing via visual prompting.
We leverage the rich, pretrained editing capabilities of text-to-image diffusion models by inverting visual prompts into editing instructions.
arXiv Detail & Related papers (2023-07-26T17:50:10Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion
Models [6.34777393532937]
We propose an accurate and quick inversion technique, Prompt Tuning Inversion, for text-driven image editing.
Our proposed editing method consists of a reconstruction stage and an editing stage.
Experiments on ImageNet demonstrate the superior editing performance of our method compared to the state-of-the-art baselines.
arXiv Detail & Related papers (2023-05-08T03:34:33Z) - DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing.
Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z) - Prompt-to-Prompt Image Editing with Cross Attention Control [41.26939787978142]
We present an intuitive prompt-to-prompt editing framework, where the edits are controlled by text only.
We show our results over diverse images and prompts, demonstrating high-quality synthesis and fidelity to the edited prompts.
arXiv Detail & Related papers (2022-08-02T17:55:41Z) - EditGAN: High-Precision Semantic Image Editing [120.49401527771067]
EditGAN is a novel method for high quality, high precision semantic image editing.
We show that EditGAN can manipulate images with an unprecedented level of detail and freedom.
We can also easily combine multiple edits and perform plausible edits beyond EditGAN training data.
arXiv Detail & Related papers (2021-11-04T22:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.