Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance
- URL: http://arxiv.org/abs/2412.12974v3
- Date: Thu, 19 Dec 2024 08:41:19 GMT
- Title: Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance
- Authors: Wenhao Sun, Benlei Cui, Xue-Mei Dong, Jingqun Tang,
- Abstract summary: Attentive Eraser is a tuning-free method to empower pre-trained diffusion models for stable and effective object removal.
We introduce Attention Activation and Suppression (ASS), which re-engineers the self-attention mechanism.
We also introduce Self-Attention Redirection Guidance (SARG), which utilizes the self-attention redirected by ASS to guide the generation process.
- Score: 4.295971864740951
- License:
- Abstract: Recently, diffusion models have emerged as promising newcomers in the field of generative models, shining brightly in image generation. However, when employed for object removal tasks, they still encounter issues such as generating random artifacts and the incapacity to repaint foreground object areas with appropriate content after removal. To tackle these problems, we propose Attentive Eraser, a tuning-free method to empower pre-trained diffusion models for stable and effective object removal. Firstly, in light of the observation that the self-attention maps influence the structure and shape details of the generated images, we propose Attention Activation and Suppression (ASS), which re-engineers the self-attention mechanism within the pre-trained diffusion models based on the given mask, thereby prioritizing the background over the foreground object during the reverse generation process. Moreover, we introduce Self-Attention Redirection Guidance (SARG), which utilizes the self-attention redirected by ASS to guide the generation process, effectively removing foreground objects within the mask while simultaneously generating content that is both plausible and coherent. Experiments demonstrate the stability and effectiveness of Attentive Eraser in object removal across a variety of pre-trained diffusion models, outperforming even training-based methods. Furthermore, Attentive Eraser can be implemented in various diffusion model architectures and checkpoints, enabling excellent scalability. Code is available at https://github.com/Anonym0u3/AttentiveEraser.
Related papers
- SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders [4.013156524547073]
Diffusion models can inadvertently generate harmful or undesirable content.
Recent machine unlearning approaches offer potential solutions but often lack transparency.
We introduce SAeUron, a novel method leveraging features learned by sparse autoencoders.
arXiv Detail & Related papers (2025-01-29T23:29:47Z) - ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer [95.80384464922147]
Continuous visual generation requires the full-sequence diffusion-based approach.
We present ACDiT, an Autoregressive blockwise Conditional Diffusion Transformer.
We demonstrate that ACDiT can be seamlessly used in visual understanding tasks despite being trained on the diffusion objective.
arXiv Detail & Related papers (2024-12-10T18:13:20Z) - Boosting Alignment for Post-Unlearning Text-to-Image Generative Models [55.82190434534429]
Large-scale generative models have shown impressive image-generation capabilities, propelled by massive data.
This often inadvertently leads to the generation of harmful or inappropriate content and raises copyright concerns.
We propose a framework that seeks an optimal model update at each unlearning iteration, ensuring monotonic improvement on both objectives.
arXiv Detail & Related papers (2024-12-09T21:36:10Z) - Mask-guided cross-image attention for zero-shot in-silico histopathologic image generation with a diffusion model [0.10910416614141322]
Diffusion models are the state-of-the-art solution for generating in-silico images.
Appearance transfer diffusion models are designed for natural images.
In computational pathology, specifically in oncology, it is not straightforward to define which objects in an image should be classified as foreground and background.
We contribute to the applicability of appearance transfer models to diffusion-stained images by modifying the appearance transfer guidance to alternate between class-specific AdaIN feature statistics matchings.
arXiv Detail & Related papers (2024-07-16T12:36:26Z) - CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models [16.58831310165623]
CLIPAway is a novel approach leveraging CLIP embeddings to focus on background regions while excluding foreground elements.
It enhances inpainting accuracy and quality by identifying embeddings that prioritize the background.
Unlike other methods that rely on specialized training datasets or costly manual annotations, CLIPAway provides a flexible, plug-and-play solution.
arXiv Detail & Related papers (2024-06-13T17:50:28Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Bridging Generative and Discriminative Models for Unified Visual
Perception with Diffusion Priors [56.82596340418697]
We propose a simple yet effective framework comprising a pre-trained Stable Diffusion (SD) model containing rich generative priors, a unified head (U-head) capable of integrating hierarchical representations, and an adapted expert providing discriminative priors.
Comprehensive investigations unveil potential characteristics of Vermouth, such as varying granularity of perception concealed in latent variables at distinct time steps and various U-net stages.
The promising results demonstrate the potential of diffusion models as formidable learners, establishing their significance in furnishing informative and robust visual representations.
arXiv Detail & Related papers (2024-01-29T10:36:57Z) - Erasing Undesirable Influence in Diffusion Models [51.225365010401006]
Diffusion models are highly effective at generating high-quality images but pose risks, such as the unintentional generation of NSFW (not safe for work) content.
In this work, we introduce EraseDiff, an algorithm designed to preserve the utility of the diffusion model on retained data while removing the unwanted information associated with the data to be forgotten.
arXiv Detail & Related papers (2024-01-11T09:30:36Z) - Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent
Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space.
Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings.
The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z) - Object-Centric Slot Diffusion [30.722428924152382]
We introduce Latent Slot Diffusion (LSD), a novel model that serves dual purposes.
We demonstrate that LSD significantly outperforms state-of-the-art transformer-based decoders.
We also conduct a preliminary investigation into the integration of pre-trained diffusion models in LSD.
arXiv Detail & Related papers (2023-03-20T02:40:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.