LIME: Localized Image Editing via Attention Regularization in Diffusion Models
- URL: http://arxiv.org/abs/2312.09256v2
- Date: Thu, 05 Dec 2024 15:05:55 GMT
- Title: LIME: Localized Image Editing via Attention Regularization in Diffusion Models
- Authors: Enis Simsar, Alessio Tonioni, Yongqin Xian, Thomas Hofmann, Federico Tombari,
- Abstract summary: This paper introduces LIME for localized image editing in diffusion models.
LIME does not require user-specified regions of interest (RoI) or additional text input, but rather employs features from pre-trained methods and a straightforward clustering method to obtain precise editing mask.
We propose a novel cross-attention regularization technique that penalizes unrelated cross-attention scores in the RoI during the denoising steps, ensuring localized edits.
- Score: 69.33072075580483
- License:
- Abstract: Diffusion models (DMs) have gained prominence due to their ability to generate high-quality varied images with recent advancements in text-to-image generation. The research focus is now shifting towards the controllability of DMs. A significant challenge within this domain is localized editing, where specific areas of an image are modified without affecting the rest of the content. This paper introduces LIME for localized image editing in diffusion models. LIME does not require user-specified regions of interest (RoI) or additional text input, but rather employs features from pre-trained methods and a straightforward clustering method to obtain precise editing mask. Then, by leveraging cross-attention maps, it refines these segments for finding regions to obtain localized edits. Finally, we propose a novel cross-attention regularization technique that penalizes unrelated cross-attention scores in the RoI during the denoising steps, ensuring localized edits. Our approach, without re-training, fine-tuning and additional user inputs, consistently improves the performance of existing methods in various editing benchmarks. The project page can be found at https://enisimsar.github.io/LIME/.
Related papers
- EditScout: Locating Forged Regions from Diffusion-based Edited Images with Multimodal LLM [50.054404519821745]
We present a novel framework that integrates a multimodal Large Language Model for enhanced reasoning capabilities.
Our framework achieves promising results on MagicBrush, AutoSplice, and PerfBrush datasets.
Notably, our method excels on the PerfBrush dataset, a self-constructed test set featuring previously unseen types of edits.
arXiv Detail & Related papers (2024-12-05T02:05:33Z) - TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models [53.757752110493215]
We focus on a popular line of text-based editing frameworks - the edit-friendly'' DDPM-noise inversion approach.
We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength.
We propose a pseudo-guidance approach that efficiently increases the magnitude of edits without introducing new artifacts.
arXiv Detail & Related papers (2024-08-01T17:27:28Z) - LocInv: Localization-aware Inversion for Text-Guided Image Editing [17.611103794346857]
Text-guided image editing research aims to empower users to manipulate generated images by altering the text prompts.
Existing image editing techniques are prone to editing over unintentional regions that are beyond the intended target area.
We propose localization-aware Inversion (LocInv), which exploits segmentation maps or bounding boxes as extra localization priors to refine the cross-attention maps.
arXiv Detail & Related papers (2024-05-02T17:27:04Z) - StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing [115.49488548588305]
A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.
They either finetune the model, or invert the image in the latent space of the pretrained model.
They suffer from two problems: Unsatisfying results for selected regions and unexpected changes in non-selected regions.
arXiv Detail & Related papers (2023-03-28T00:16:45Z) - Region-Aware Diffusion for Zero-shot Text-driven Image Editing [78.58917623854079]
We propose a novel region-aware diffusion model (RDM) for entity-level image editing.
To strike a balance between image fidelity and inference speed, we design the intensive diffusion pipeline.
The results show that RDM outperforms the previous approaches in terms of visual quality, overall harmonization, non-editing region content preservation, and text-image semantic consistency.
arXiv Detail & Related papers (2023-02-23T06:20:29Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z) - Blended Latent Diffusion [18.043090347648157]
We present an accelerated solution to the task of local text-driven editing of generic images, where the desired edits are confined to a user-provided mask.
Our solution leverages a recent text-to-image Latent Diffusion Model (LDM), which speeds up diffusion by operating in a lower-dimensional latent space.
arXiv Detail & Related papers (2022-06-06T17:58:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.