LoMOE: Localized Multi-Object Editing via Multi-Diffusion
- URL: http://arxiv.org/abs/2403.00437v1
- Date: Fri, 1 Mar 2024 10:46:47 GMT
- Title: LoMOE: Localized Multi-Object Editing via Multi-Diffusion
- Authors: Goirik Chakrabarty, Aditya Chandrasekar, Ramya Hebbalaguppe, Prathosh
AP
- Abstract summary: We introduce a novel framework for zero-shot localized multi-object editing through a multi-diffusion process.
Our approach leverages foreground masks and corresponding simple text prompts that exert localized influences on the target regions.
A combination of cross-attention and background losses within the latent space ensures that the characteristics of the object being edited are preserved.
- Score: 8.90467024388923
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent developments in the field of diffusion models have demonstrated an
exceptional capacity to generate high-quality prompt-conditioned image edits.
Nevertheless, previous approaches have primarily relied on textual prompts for
image editing, which tend to be less effective when making precise edits to
specific objects or fine-grained regions within a scene containing
single/multiple objects. We introduce a novel framework for zero-shot localized
multi-object editing through a multi-diffusion process to overcome this
challenge. This framework empowers users to perform various operations on
objects within an image, such as adding, replacing, or editing $\textbf{many}$
objects in a complex scene $\textbf{in one pass}$. Our approach leverages
foreground masks and corresponding simple text prompts that exert localized
influences on the target regions resulting in high-fidelity image editing. A
combination of cross-attention and background preservation losses within the
latent space ensures that the characteristics of the object being edited are
preserved while simultaneously achieving a high-quality, seamless
reconstruction of the background with fewer artifacts compared to the current
methods. We also curate and release a dataset dedicated to multi-object
editing, named $\texttt{LoMOE}$-Bench. Our experiments against existing
state-of-the-art methods demonstrate the improved effectiveness of our approach
in terms of both image editing quality and inference speed.
Related papers
- Move and Act: Enhanced Object Manipulation and Background Integrity for Image Editing [63.32399428320422]
We propose a tuning-free method with only two branches: inversion and editing.
This approach allows users to simultaneously edit the object's action and control the generation position of the edited object.
Impressive image editing results and quantitative evaluation demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-07-25T08:00:49Z) - Zero-shot Image Editing with Reference Imitation [50.75310094611476]
We present a new form of editing, termed imitative editing, to help users exercise their creativity more conveniently.
We propose a generative training framework, dubbed MimicBrush, which randomly selects two frames from a video clip, masks some regions of one frame, and learns to recover the masked regions using the information from the other frame.
We experimentally show the effectiveness of our method under various test cases as well as its superiority over existing alternatives.
arXiv Detail & Related papers (2024-06-11T17:59:51Z) - ParallelEdits: Efficient Multi-Aspect Text-Driven Image Editing with Attention Grouping [31.026083872774834]
ParallelEdits is a method that seamlessly manages simultaneous edits across multiple attributes.
PIE-Bench++ dataset is a benchmark for evaluating text-driven image editing methods in multifaceted scenarios.
arXiv Detail & Related papers (2024-06-03T04:43:56Z) - DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing [22.855660721387167]
We transform the spatial-aware image editing task into a combination of two sub-tasks: multi-layered latent decomposition and multi-layered latent fusion.
We show that our approach consistently surpasses the latest spatial editing methods, including Self-Guidance and DiffEditor.
arXiv Detail & Related papers (2024-03-21T15:35:42Z) - An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control [21.624984690721842]
D-Edit is a framework to disentangle the comprehensive image-prompt interaction into several item-prompt interactions.
It is based on pretrained diffusion models with cross-attention layers disentangled and adopts a two-step optimization to build item-prompt associations.
We demonstrate state-of-the-art results in four types of editing operations including image-based, text-based, mask-based editing, and item removal.
arXiv Detail & Related papers (2024-03-07T20:06:29Z) - DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image
Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years.
We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing.
Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z) - Object-aware Inversion and Reassembly for Image Editing [61.19822563737121]
We propose Object-aware Inversion and Reassembly (OIR) to enable object-level fine-grained editing.
We use our search metric to find the optimal inversion step for each editing pair when editing an image.
Our method achieves superior performance in editing object shapes, colors, materials, categories, etc., especially in multi-object editing scenarios.
arXiv Detail & Related papers (2023-10-18T17:59:02Z) - LayerDiffusion: Layered Controlled Image Editing with Diffusion Models [5.58892860792971]
LayerDiffusion is a semantic-based layered controlled image editing method.
We leverage a large-scale text-to-image model and employ a layered controlled optimization strategy.
Experimental results demonstrate the effectiveness of our method in generating highly coherent images.
arXiv Detail & Related papers (2023-05-30T01:26:41Z) - PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor [135.17302411419834]
PAIR Diffusion is a generic framework that enables a diffusion model to control the structure and appearance of each object in the image.
We show that having control over the properties of each object in an image leads to comprehensive editing capabilities.
Our framework allows for various object-level editing operations on real images such as reference image-based appearance editing, free-form shape editing, adding objects, and variations.
arXiv Detail & Related papers (2023-03-30T17:13:56Z) - DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing.
Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.