MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based
Attention-Adjusted Guidance
- URL: http://arxiv.org/abs/2312.11396v2
- Date: Thu, 21 Dec 2023 06:39:15 GMT
- Title: MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based
Attention-Adjusted Guidance
- Authors: Qi Mao, Lan Chen, Yuchao Gu, Zhen Fang, Mike Zheng Shou
- Abstract summary: We develop MAG-Edit, a training-free, inference-stage optimization method, which enables localized image editing in complex scenarios.
In particular, MAG-Edit optimize the noise latent feature in diffusion models by maximizing two mask-based cross-attention constraints.
- Score: 28.212908146852197
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent diffusion-based image editing approaches have exhibited impressive
editing capabilities in images with simple compositions. However, localized
editing in complex scenarios has not been well-studied in the literature,
despite its growing real-world demands. Existing mask-based inpainting methods
fall short of retaining the underlying structure within the edit region.
Meanwhile, mask-free attention-based methods often exhibit editing leakage and
misalignment in more complex compositions. In this work, we develop MAG-Edit, a
training-free, inference-stage optimization method, which enables localized
image editing in complex scenarios. In particular, MAG-Edit optimizes the noise
latent feature in diffusion models by maximizing two mask-based cross-attention
constraints of the edit token, which in turn gradually enhances the local
alignment with the desired prompt. Extensive quantitative and qualitative
experiments demonstrate the effectiveness of our method in achieving both text
alignment and structure preservation for localized editing within complex
scenarios.
Related papers
- PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models [80.98455219375862]
We present the first text-based image editing approach for object parts based on pre-trained diffusion models.
Our approach is preferred by users 77-90% of the time in conducted user studies.
arXiv Detail & Related papers (2025-02-06T13:08:43Z) - MADiff: Text-Guided Fashion Image Editing with Mask Prediction and Attention-Enhanced Diffusion [9.149799210311468]
The MADiff model is proposed to more accurately identify editing region.
The Attention-Enhanced Diffusion Model is proposed to strengthen the editing magnitude.
Our proposed method can accurately predict the mask of editing region and significantly enhance editing magnitude in fashion image editing.
arXiv Detail & Related papers (2024-12-28T07:34:49Z) - BrushEdit: All-In-One Image Inpainting and Editing [79.55816192146762]
BrushEdit is a novel inpainting-based instruction-guided image editing paradigm.
We devise a system enabling free-form instruction editing by integrating MLLMs and a dual-branch image inpainting model.
Our framework effectively combines MLLMs and inpainting models, achieving superior performance across seven metrics.
arXiv Detail & Related papers (2024-12-13T17:58:06Z) - Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks.
ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z) - TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models [53.757752110493215]
We focus on a popular line of text-based editing frameworks - the edit-friendly'' DDPM-noise inversion approach.
We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength.
We propose a pseudo-guidance approach that efficiently increases the magnitude of edits without introducing new artifacts.
arXiv Detail & Related papers (2024-08-01T17:27:28Z) - LoMOE: Localized Multi-Object Editing via Multi-Diffusion [8.90467024388923]
We introduce a novel framework for zero-shot localized multi-object editing through a multi-diffusion process.
Our approach leverages foreground masks and corresponding simple text prompts that exert localized influences on the target regions.
A combination of cross-attention and background losses within the latent space ensures that the characteristics of the object being edited are preserved.
arXiv Detail & Related papers (2024-03-01T10:46:47Z) - Noise Map Guidance: Inversion with Spatial Context for Real Image
Editing [23.513950664274997]
Text-guided diffusion models have become a popular tool in image synthesis, known for producing high-quality and diverse images.
Their application to editing real images often encounters hurdles due to the text condition deteriorating the reconstruction quality and subsequently affecting editing fidelity.
We present Noise Map Guidance (NMG), an inversion method rich in a spatial context, tailored for real-image editing.
arXiv Detail & Related papers (2024-02-07T07:16:12Z) - LIME: Localized Image Editing via Attention Regularization in Diffusion Models [69.33072075580483]
This paper introduces LIME for localized image editing in diffusion models.
LIME does not require user-specified regions of interest (RoI) or additional text input, but rather employs features from pre-trained methods and a straightforward clustering method to obtain precise editing mask.
We propose a novel cross-attention regularization technique that penalizes unrelated cross-attention scores in the RoI during the denoising steps, ensuring localized edits.
arXiv Detail & Related papers (2023-12-14T18:59:59Z) - Optimisation-Based Multi-Modal Semantic Image Editing [58.496064583110694]
We propose an inference-time editing optimisation to accommodate multiple editing instruction types.
By allowing to adjust the influence of each loss function, we build a flexible editing solution that can be adjusted to user preferences.
We evaluate our method using text, pose and scribble edit conditions, and highlight our ability to achieve complex edits.
arXiv Detail & Related papers (2023-11-28T15:31:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.