Towards Efficient Diffusion-Based Image Editing with Instant Attention
Masks
- URL: http://arxiv.org/abs/2401.07709v2
- Date: Tue, 23 Jan 2024 11:22:03 GMT
- Title: Towards Efficient Diffusion-Based Image Editing with Instant Attention
Masks
- Authors: Siyu Zou, Jiji Tang, Yiyi Zhou, Jing He, Chaoyi Zhao, Rongsheng Zhang,
Zhipeng Hu, Xiaoshuai Sun
- Abstract summary: In this paper, we propose a novel and efficient image editing method for Text-to-Image (T2I) diffusion models, termed Instant Diffusion Editing(InstDiffEdit)
In particular, InstDiffEdit aims to employ the cross-modal attention ability of existing diffusion models to achieve instant mask guidance during the diffusion steps.
To supplement the existing evaluations of DIE, we propose a new benchmark called Editing-Mask to examine the mask accuracy and local editing ability of existing methods.
- Score: 43.079272743475435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion-based Image Editing (DIE) is an emerging research hot-spot, which
often applies a semantic mask to control the target area for diffusion-based
editing. However, most existing solutions obtain these masks via manual
operations or off-line processing, greatly reducing their efficiency. In this
paper, we propose a novel and efficient image editing method for Text-to-Image
(T2I) diffusion models, termed Instant Diffusion Editing(InstDiffEdit). In
particular, InstDiffEdit aims to employ the cross-modal attention ability of
existing diffusion models to achieve instant mask guidance during the diffusion
steps. To reduce the noise of attention maps and realize the full automatics,
we equip InstDiffEdit with a training-free refinement scheme to adaptively
aggregate the attention distributions for the automatic yet accurate mask
generation. Meanwhile, to supplement the existing evaluations of DIE, we
propose a new benchmark called Editing-Mask to examine the mask accuracy and
local editing ability of existing methods. To validate InstDiffEdit, we also
conduct extensive experiments on ImageNet and Imagen, and compare it with a
bunch of the SOTA methods. The experimental results show that InstDiffEdit not
only outperforms the SOTA methods in both image quality and editing results,
but also has a much faster inference speed, i.e., +5 to +6 times.
Related papers
- PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models [80.98455219375862]
We present the first text-based image editing approach for object parts based on pre-trained diffusion models.
Our approach is preferred by users 77-90% of the time in conducted user studies.
arXiv Detail & Related papers (2025-02-06T13:08:43Z) - MADiff: Text-Guided Fashion Image Editing with Mask Prediction and Attention-Enhanced Diffusion [9.149799210311468]
The MADiff model is proposed to more accurately identify editing region.
The Attention-Enhanced Diffusion Model is proposed to strengthen the editing magnitude.
Our proposed method can accurately predict the mask of editing region and significantly enhance editing magnitude in fashion image editing.
arXiv Detail & Related papers (2024-12-28T07:34:49Z) - Stable Flow: Vital Layers for Training-Free Image Editing [74.52248787189302]
Diffusion models have revolutionized the field of content synthesis and editing.
Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT)
We propose an automatic method to identify "vital layers" within DiT, crucial for image formation.
Next, to enable real-image editing, we introduce an improved image inversion method for flow models.
arXiv Detail & Related papers (2024-11-21T18:59:51Z) - TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models [53.757752110493215]
We focus on a popular line of text-based editing frameworks - the edit-friendly'' DDPM-noise inversion approach.
We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength.
We propose a pseudo-guidance approach that efficiently increases the magnitude of edits without introducing new artifacts.
arXiv Detail & Related papers (2024-08-01T17:27:28Z) - Streamlining Image Editing with Layered Diffusion Brushes [8.738398948669609]
Our system renders a single edit on a 512x512 image within 140 ms using a high-end consumer GPU.
Our approach demonstrates efficacy across a range of tasks, including object attribute adjustments, error correction, and sequential prompt-based object placement and manipulation.
arXiv Detail & Related papers (2024-05-01T04:30:03Z) - LIME: Localized Image Editing via Attention Regularization in Diffusion Models [69.33072075580483]
This paper introduces LIME for localized image editing in diffusion models.
LIME does not require user-specified regions of interest (RoI) or additional text input, but rather employs features from pre-trained methods and a straightforward clustering method to obtain precise editing mask.
We propose a novel cross-attention regularization technique that penalizes unrelated cross-attention scores in the RoI during the denoising steps, ensuring localized edits.
arXiv Detail & Related papers (2023-12-14T18:59:59Z) - DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing.
Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z) - Blended Latent Diffusion [18.043090347648157]
We present an accelerated solution to the task of local text-driven editing of generic images, where the desired edits are confined to a user-provided mask.
Our solution leverages a recent text-to-image Latent Diffusion Model (LDM), which speeds up diffusion by operating in a lower-dimensional latent space.
arXiv Detail & Related papers (2022-06-06T17:58:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.