Related papers: Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance

Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance

URL: http://arxiv.org/abs/2412.12974v3
Date: Thu, 19 Dec 2024 08:41:19 GMT
Title: Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance
Authors: Wenhao Sun, Benlei Cui, Xue-Mei Dong, Jingqun Tang,
Abstract summary: Attentive Eraser is a tuning-free method to empower pre-trained diffusion models for stable and effective object removal.<n>We introduce Attention Activation and Suppression (ASS), which re-engineers the self-attention mechanism.<n>We also introduce Self-Attention Redirection Guidance (SARG), which utilizes the self-attention redirected by ASS to guide the generation process.
Score: 4.295971864740951
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, diffusion models have emerged as promising newcomers in the field of generative models, shining brightly in image generation. However, when employed for object removal tasks, they still encounter issues such as generating random artifacts and the incapacity to repaint foreground object areas with appropriate content after removal. To tackle these problems, we propose Attentive Eraser, a tuning-free method to empower pre-trained diffusion models for stable and effective object removal. Firstly, in light of the observation that the self-attention maps influence the structure and shape details of the generated images, we propose Attention Activation and Suppression (ASS), which re-engineers the self-attention mechanism within the pre-trained diffusion models based on the given mask, thereby prioritizing the background over the foreground object during the reverse generation process. Moreover, we introduce Self-Attention Redirection Guidance (SARG), which utilizes the self-attention redirected by ASS to guide the generation process, effectively removing foreground objects within the mask while simultaneously generating content that is both plausible and coherent. Experiments demonstrate the stability and effectiveness of Attentive Eraser in object removal across a variety of pre-trained diffusion models, outperforming even training-based methods. Furthermore, Attentive Eraser can be implemented in various diffusion model architectures and checkpoints, enabling excellent scalability. Code is available at https://github.com/Anonym0u3/AttentiveEraser.

Related papers

Embedding Hidden Adversarial Capabilities in Pre-Trained Diffusion Models [1.534667887016089]
We introduce a new attack paradigm that embeds hidden adversarial capabilities directly into diffusion models via fine-tuning. The resulting tampered model generates high-quality images indistinguishable from those of the original. We demonstrate the effectiveness and stealthiness of our approach, uncovering a covert attack vector that raises new security concerns.
arXiv Detail & Related papers (2025-04-05T12:51:36Z)
Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways [13.08168394252538]
Erase inpainting aims to precisely remove target objects within masked regions while preserving the overall consistency of the surrounding content. We propose a novel Erase Diffusion, termed EraDiff, aimed at unleashing the potential power of standard diffusion in the context of object removal. Our proposed EraDiff achieves state-of-the-art performance on the OpenImages V5 dataset and demonstrates significant superiority in real-world scenarios.
arXiv Detail & Related papers (2025-03-10T08:06:51Z)
One-for-More: Continual Diffusion Model for Anomaly Detection [61.12622458367425]
Anomaly detection methods utilize diffusion models to generate or reconstruct normal samples when given arbitrary anomaly images. Our study found that the diffusion model suffers from severe faithfulness hallucination'' and catastrophic forgetting'' We propose a continual diffusion model that uses gradient projection to achieve stable continual learning.
arXiv Detail & Related papers (2025-02-27T07:47:27Z)
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer [95.80384464922147]
Continuous visual generation requires the full-sequence diffusion-based approach.<n>We present ACDiT, an Autoregressive blockwise Conditional Diffusion Transformer.<n>We demonstrate that ACDiT can be seamlessly used in visual understanding tasks despite being trained on the diffusion objective.
arXiv Detail & Related papers (2024-12-10T18:13:20Z)
Boosting Alignment for Post-Unlearning Text-to-Image Generative Models [55.82190434534429]
Large-scale generative models have shown impressive image-generation capabilities, propelled by massive data. This often inadvertently leads to the generation of harmful or inappropriate content and raises copyright concerns. We propose a framework that seeks an optimal model update at each unlearning iteration, ensuring monotonic improvement on both objectives.
arXiv Detail & Related papers (2024-12-09T21:36:10Z)
Rethinking and Defending Protective Perturbation in Personalized Diffusion Models [21.30373461975769]
We study the fine-tuning process of personalized diffusion models (PDMs) through the lens of shortcut learning. PDMs are susceptible to minor adversarial perturbations, leading to significant degradation when fine-tuned on corrupted datasets. We propose a systematic defense framework that includes data purification and contrastive decoupling learning.
arXiv Detail & Related papers (2024-06-27T07:14:14Z)
CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models [16.58831310165623]
CLIPAway is a novel approach leveraging CLIP embeddings to focus on background regions while excluding foreground elements. It enhances inpainting accuracy and quality by identifying embeddings that prioritize the background. Unlike other methods that rely on specialized training datasets or costly manual annotations, CLIPAway provides a flexible, plug-and-play solution.
arXiv Detail & Related papers (2024-06-13T17:50:28Z)
DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task. We first apply attention masking in each denoising step to make the generation more disentangled across different objects. In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z)
Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors [56.82596340418697]
We propose a simple yet effective framework comprising a pre-trained Stable Diffusion (SD) model containing rich generative priors, a unified head (U-head) capable of integrating hierarchical representations, and an adapted expert providing discriminative priors. Comprehensive investigations unveil potential characteristics of Vermouth, such as varying granularity of perception concealed in latent variables at distinct time steps and various U-net stages. The promising results demonstrate the potential of diffusion models as formidable learners, establishing their significance in furnishing informative and robust visual representations.
arXiv Detail & Related papers (2024-01-29T10:36:57Z)
Erasing Undesirable Influence in Diffusion Models [51.225365010401006]
Diffusion models are highly effective at generating high-quality images but pose risks, such as the unintentional generation of NSFW (not safe for work) content. In this work, we introduce EraseDiff, an algorithm designed to preserve the utility of the diffusion model on retained data while removing the unwanted information associated with the data to be forgotten.
arXiv Detail & Related papers (2024-01-11T09:30:36Z)
Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space. Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings. The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.