Related papers: Visual Instruction Inversion: Image Editing via Visual Prompting

Visual Instruction Inversion: Image Editing via Visual Prompting

URL: http://arxiv.org/abs/2307.14331v1
Date: Wed, 26 Jul 2023 17:50:10 GMT
Title: Visual Instruction Inversion: Image Editing via Visual Prompting
Authors: Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee
Abstract summary: We present a method for image editing via visual prompting. We leverage the rich, pretrained editing capabilities of text-to-image diffusion models by inverting visual prompts into editing instructions.
Score: 34.96778567507126
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-conditioned image editing has emerged as a powerful tool for editing images. However, in many situations, language can be ambiguous and ineffective in describing specific image edits. When faced with such challenges, visual prompts can be a more informative and intuitive way to convey ideas. We present a method for image editing via visual prompting. Given pairs of example that represent the "before" and "after" images of an edit, our goal is to learn a text-based editing direction that can be used to perform the same edit on new images. We leverage the rich, pretrained editing capabilities of text-to-image diffusion models by inverting visual prompts into editing instructions. Our results show that with just one example pair, we can achieve competitive results compared to state-of-the-art text-conditioned image editing frameworks.

Related papers

Towards Efficient Exemplar Based Image Editing with Multimodal VLMs [11.830273909934688]
In this work, we tackle the task of transferring an edit from an exemplar pair to a content image(s) by leveraging text-to-image diffusion models and multimodal VLMs.<n>Our end-to-end pipeline is optimization-free, but our experiments demonstrate that it still outperforms baselines on multiple types of edits while being 4x faster.
arXiv Detail & Related papers (2025-06-25T06:20:36Z)
Textualize Visual Prompt for Image Editing via Diffusion Bridge [15.696208035498753]
Current visual prompt methods rely on a pretrained text-guided image-to-image generative model. We present a framework based on any single text-to-image model without reliance on the explicit image-to-image model.
arXiv Detail & Related papers (2025-01-07T03:33:22Z)
InstructBrush: Learning Attention-based Instruction Optimization for Image Editing [54.07526261513434]
InstructBrush is an inversion method for instruction-based image editing methods. It extracts editing effects from image pairs as editing instructions, which are further applied for image editing. Our approach achieves superior performance in editing and is more semantically consistent with the target editing effects.
arXiv Detail & Related papers (2024-03-27T15:03:38Z)
An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control [21.624984690721842]
D-Edit is a framework to disentangle the comprehensive image-prompt interaction into several item-prompt interactions. It is based on pretrained diffusion models with cross-attention layers disentangled and adopts a two-step optimization to build item-prompt associations. We demonstrate state-of-the-art results in four types of editing operations including image-based, text-based, mask-based editing, and item removal.
arXiv Detail & Related papers (2024-03-07T20:06:29Z)
Edit One for All: Interactive Batch Image Editing [44.50631647670942]
This paper presents a novel method for interactive batch image editing using StyleGAN as the medium. Given an edit specified by users in an example image (e.g., make the face frontal), our method can automatically transfer that edit to other test images. Experiments demonstrate that edits performed using our method have similar visual quality to existing single-image-editing methods.
arXiv Detail & Related papers (2024-01-18T18:58:44Z)
Optimisation-Based Multi-Modal Semantic Image Editing [58.496064583110694]
We propose an inference-time editing optimisation to accommodate multiple editing instruction types. By allowing to adjust the influence of each loss function, we build a flexible editing solution that can be adjusted to user preferences. We evaluate our method using text, pose and scribble edit conditions, and highlight our ability to achieve complex edits.
arXiv Detail & Related papers (2023-11-28T15:31:11Z)
Emu Edit: Precise Image Editing via Recognition and Generation Tasks [62.95717180730946]
We present Emu Edit, a multi-task image editing model which sets state-of-the-art results in instruction-based image editing. We train it to multi-task across an unprecedented range of tasks, such as region-based editing, free-form editing, and Computer Vision tasks. We show that Emu Edit can generalize to new tasks, such as image inpainting, super-resolution, and compositions of editing tasks, with just a few labeled examples.
arXiv Detail & Related papers (2023-11-16T18:55:58Z)
Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models [6.34777393532937]
We propose an accurate and quick inversion technique, Prompt Tuning Inversion, for text-driven image editing. Our proposed editing method consists of a reconstruction stage and an editing stage. Experiments on ImageNet demonstrate the superior editing performance of our method compared to the state-of-the-art baselines.
arXiv Detail & Related papers (2023-05-08T03:34:33Z)
Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting. We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process. Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z)
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting [53.708523312636096]
We present Imagen Editor, a cascaded diffusion model built, by fine-tuning on text-guided image inpainting. edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training. To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting.
arXiv Detail & Related papers (2022-12-13T21:25:11Z)
DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing. Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z)
UniTune: Text-Driven Image Editing by Fine Tuning a Diffusion Model on a Single Image [2.999198565272416]
We make the observation that image-generation models can be converted to image-editing models simply by fine-tuning them on a single image. We propose UniTune, a novel image editing method. UniTune gets as input an arbitrary image and a textual edit description, and carries out the edit while maintaining high fidelity to the input image. We demonstrate that it is broadly applicable and can perform a surprisingly wide range of expressive editing operations, including those requiring significant visual changes that were previously impossible.
arXiv Detail & Related papers (2022-10-17T23:46:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.