Emu Edit: Precise Image Editing via Recognition and Generation Tasks
- URL: http://arxiv.org/abs/2311.10089v1
- Date: Thu, 16 Nov 2023 18:55:58 GMT
- Title: Emu Edit: Precise Image Editing via Recognition and Generation Tasks
- Authors: Shelly Sheynin, Adam Polyak, Uriel Singer, Yuval Kirstain, Amit Zohar,
Oron Ashual, Devi Parikh, Yaniv Taigman
- Abstract summary: We present Emu Edit, a multi-task image editing model which sets state-of-the-art results in instruction-based image editing.
We train it to multi-task across an unprecedented range of tasks, such as region-based editing, free-form editing, and Computer Vision tasks.
We show that Emu Edit can generalize to new tasks, such as image inpainting, super-resolution, and compositions of editing tasks, with just a few labeled examples.
- Score: 62.95717180730946
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instruction-based image editing holds immense potential for a variety of
applications, as it enables users to perform any editing operation using a
natural language instruction. However, current models in this domain often
struggle with accurately executing user instructions. We present Emu Edit, a
multi-task image editing model which sets state-of-the-art results in
instruction-based image editing. To develop Emu Edit we train it to multi-task
across an unprecedented range of tasks, such as region-based editing, free-form
editing, and Computer Vision tasks, all of which are formulated as generative
tasks. Additionally, to enhance Emu Edit's multi-task learning abilities, we
provide it with learned task embeddings which guide the generation process
towards the correct edit type. Both these elements are essential for Emu Edit's
outstanding performance. Furthermore, we show that Emu Edit can generalize to
new tasks, such as image inpainting, super-resolution, and compositions of
editing tasks, with just a few labeled examples. This capability offers a
significant advantage in scenarios where high-quality samples are scarce.
Lastly, to facilitate a more rigorous and informed assessment of instructable
image editing models, we release a new challenging and versatile benchmark that
includes seven different image editing tasks.
Related papers
- AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea [88.79769371584491]
We present AnyEdit, a comprehensive multi-modal instruction editing dataset.
We ensure the diversity and quality of the AnyEdit collection through three aspects: initial data diversity, adaptive editing process, and automated selection of editing results.
Experiments on three benchmark datasets show that AnyEdit consistently boosts the performance of diffusion-based editing models.
arXiv Detail & Related papers (2024-11-24T07:02:56Z) - Achieving Complex Image Edits via Function Aggregation with Diffusion Models [15.509233098264513]
Diffusion models have demonstrated strong performance in generative tasks, making them ideal candidates for image editing.
We introduce FunEditor, an efficient diffusion model designed to learn atomic editing functions and perform complex edits by aggregating simpler functions.
FunEditor is 5 to 24 times faster inference than existing methods on complex tasks like object movement.
arXiv Detail & Related papers (2024-08-16T02:33:55Z) - InstructBrush: Learning Attention-based Instruction Optimization for Image Editing [54.07526261513434]
InstructBrush is an inversion method for instruction-based image editing methods.
It extracts editing effects from image pairs as editing instructions, which are further applied for image editing.
Our approach achieves superior performance in editing and is more semantically consistent with the target editing effects.
arXiv Detail & Related papers (2024-03-27T15:03:38Z) - Optimisation-Based Multi-Modal Semantic Image Editing [58.496064583110694]
We propose an inference-time editing optimisation to accommodate multiple editing instruction types.
By allowing to adjust the influence of each loss function, we build a flexible editing solution that can be adjusted to user preferences.
We evaluate our method using text, pose and scribble edit conditions, and highlight our ability to achieve complex edits.
arXiv Detail & Related papers (2023-11-28T15:31:11Z) - LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance [0.0]
LEDITS is a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance.
This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.
arXiv Detail & Related papers (2023-07-02T09:11:09Z) - SpaceEdit: Learning a Unified Editing Space for Open-Domain Image
Editing [94.31103255204933]
We propose a unified model for open-domain image editing focusing on color and tone adjustment of open-domain images.
Our model learns a unified editing space that is more semantic, intuitive, and easy to manipulate.
We show that by inverting image pairs into latent codes of the learned editing space, our model can be leveraged for various downstream editing tasks.
arXiv Detail & Related papers (2021-11-30T23:53:32Z) - EditGAN: High-Precision Semantic Image Editing [120.49401527771067]
EditGAN is a novel method for high quality, high precision semantic image editing.
We show that EditGAN can manipulate images with an unprecedented level of detail and freedom.
We can also easily combine multiple edits and perform plausible edits beyond EditGAN training data.
arXiv Detail & Related papers (2021-11-04T22:36:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.