Related papers: Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization

Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization

URL: http://arxiv.org/abs/2405.20584v2
Date: Fri, 26 Jul 2024 02:10:04 GMT
Title: Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization
Authors: Yisu Liu, Jinyang An, Wanqian Zhang, Dayan Wu, Jingzi Gu, Zheng Lin, Weiping Wang,
Abstract summary: malicious users have misused diffusion-based customization methods like DreamBooth to create fake images. In this paper, we propose DisDiff, a novel adversarial attack method to disrupt the diffusion model outputs.
Score: 19.635385099376066
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the development of diffusion-based customization methods like DreamBooth, individuals now have access to train the models that can generate their personalized images. Despite the convenience, malicious users have misused these techniques to create fake images, thereby triggering a privacy security crisis. In light of this, proactive adversarial attacks are proposed to protect users against customization. The adversarial examples are trained to distort the customization model's outputs and thus block the misuse. In this paper, we propose DisDiff (Disrupting Diffusion), a novel adversarial attack method to disrupt the diffusion model outputs. We first delve into the intrinsic image-text relationships, well-known as cross-attention, and empirically find that the subject-identifier token plays an important role in guiding image generation. Thus, we propose the Cross-Attention Erasure module to explicitly "erase" the indicated attention maps and disrupt the text guidance. Besides,we analyze the influence of the sampling process of the diffusion model on Projected Gradient Descent (PGD) attack and introduce a novel Merit Sampling Scheduler to adaptively modulate the perturbation updating amplitude in a step-aware manner. Our DisDiff outperforms the state-of-the-art methods by 12.75% of FDFR scores and 7.25% of ISM scores across two facial benchmarks and two commonly used prompts on average.

Related papers

Noise as a Probe: Membership Inference Attacks on Diffusion Models Leveraging Initial Noise [51.179816451161635]
Diffusion models have achieved remarkable progress in image generation, but their increasing deployment raises serious concerns about privacy.<n>In this work, we utilize a critical yet overlooked vulnerability: the widely used noise schedules fail to fully eliminate semantic information in the images.<n>We propose a simple yet effective membership inference attack, which injects semantic information into the initial noise and infers membership by analyzing the model's generation result.
arXiv Detail & Related papers (2026-01-29T12:29:01Z)
ExposeAnyone: Personalized Audio-to-Expression Diffusion Models Are Robust Zero-Shot Face Forgery Detectors [58.45131932883374]
We propose a fully self-supervised approach to detect deepfakes in videos.<n>Our model computes the identity distances between suspected videos and personalized subjects via diffusion reconstruction errors.<n>Our method is highly robust to corruptions such as blur and compression, highlighting the applicability in real-world face forgery detection.
arXiv Detail & Related papers (2026-01-05T18:59:54Z)
Latent Diffusion Unlearning: Protecting Against Unauthorized Personalization Through Trajectory Shifted Perturbations [18.024767641200064]
We propose a model-based perturbation strategy that operates within the latent space of diffusion models.<n>Our method alternates between denoising and inversion while modifying the starting point of the denoising trajectory: of diffusion models.<n>We validate our approach on four benchmark datasets to demonstrate robustness against state-of-the-art inversion attacks.
arXiv Detail & Related papers (2025-10-03T15:18:45Z)
PromptFlare: Prompt-Generalized Defense via Cross-Attention Decoy in Diffusion-Based Inpainting [25.24109316946351]
We propose PromptFlare, a novel adversarial protection method designed to protect images from malicious modifications facilitated by diffusion-based inpainting models.<n>Our approach exploits the intrinsic properties of prompt embeddings and injects adversarial noise to suppress the sampling process.<n>Experiments on the EditBench dataset demonstrate that our method achieves state-of-the-art performance across various metrics.
arXiv Detail & Related papers (2025-08-22T08:42:46Z)
Active Adversarial Noise Suppression for Image Forgery Localization [56.98050814363447]
We introduce an Adversarial Noise Suppression Module (ANSM) that generate a defensive perturbation to suppress the attack effect of adversarial noise.<n>To our best knowledge, this is the first report of adversarial defense in image forgery localization tasks.
arXiv Detail & Related papers (2025-06-15T14:53:27Z)
Embedding Hidden Adversarial Capabilities in Pre-Trained Diffusion Models [1.534667887016089]
We introduce a new attack paradigm that embeds hidden adversarial capabilities directly into diffusion models via fine-tuning. The resulting tampered model generates high-quality images indistinguishable from those of the original. We demonstrate the effectiveness and stealthiness of our approach, uncovering a covert attack vector that raises new security concerns.
arXiv Detail & Related papers (2025-04-05T12:51:36Z)
Make the Most of Everything: Further Considerations on Disrupting Diffusion-based Customization [11.704329867109237]
We propose Dual Anti-Diffusion (DADiff), a two-stage adversarial attack targeting diffusion customization. Experimental results on various mainstream facial datasets demonstrate 10%-30% improvements in cross-prompt, keyword mismatch, cross-model, and cross-mechanism anti-customization.
arXiv Detail & Related papers (2025-03-18T06:22:03Z)
Privacy Protection in Personalized Diffusion Models via Targeted Cross-Attention Adversarial Attack [5.357486699062561]
We propose a novel and efficient adversarial attack method, Concept Protection by Selective Attention Manipulation (CoPSAM) For this purpose, we carefully construct an imperceptible noise to be added to clean samples to get their adversarial counterparts. Experimental validation on a subset of CelebA-HQ face images dataset demonstrates that our approach outperforms existing methods.
arXiv Detail & Related papers (2024-11-25T14:39:18Z)
Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models [27.83772742404565]
We introduce a Prompt-Agnostic Adversarial Perturbation (PAP) method for customized diffusion models. PAP first models the prompt distribution using a Laplace Approximation, and then produces prompt-agnostic perturbations by maximizing a disturbance expectation. This approach effectively tackles the prompt-agnostic attacks, leading to improved defense stability.
arXiv Detail & Related papers (2024-08-20T06:17:56Z)
DDAP: Dual-Domain Anti-Personalization against Text-to-Image Diffusion Models [18.938687631109925]
Diffusion-based personalized visual content generation technologies have achieved significant breakthroughs. However, when misused to fabricate fake news or unsettling content targeting individuals, these technologies could cause considerable societal harm. This paper introduces a novel Dual-Domain Anti-Personalization framework (DDAP) By alternating between these two methods, we construct the DDAP framework, effectively harnessing the strengths of both domains.
arXiv Detail & Related papers (2024-07-29T16:11:21Z)
Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z)
Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks [41.531913152661296]
We formulate the problem of targeted adversarial attack on Stable Diffusion and propose a framework to generate adversarial prompts. Specifically, we design a gradient-based embedding optimization method to craft reliable adversarial prompts that guide stable diffusion to generate specific images. After obtaining successful adversarial prompts, we reveal the mechanisms that cause the vulnerability of the model.
arXiv Detail & Related papers (2024-01-16T12:15:39Z)
Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space. Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings. The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z)
DiffProtect: Generate Adversarial Examples with Diffusion Models for Facial Privacy Protection [64.77548539959501]
DiffProtect produces more natural-looking encrypted images than state-of-the-art methods. It achieves significantly higher attack success rates, e.g., 24.5% and 25.1% absolute improvements on the CelebA-HQ and FFHQ datasets.
arXiv Detail & Related papers (2023-05-23T02:45:49Z)
Guided Diffusion Model for Adversarial Purification [103.4596751105955]
Adversarial attacks disturb deep neural networks (DNNs) in various algorithms and frameworks. We propose a novel purification approach, referred to as guided diffusion model for purification (GDMP) On our comprehensive experiments across various datasets, the proposed GDMP is shown to reduce the perturbations raised by adversarial attacks to a shallow range.
arXiv Detail & Related papers (2022-05-30T10:11:15Z)
Diffusion Models for Adversarial Purification [69.1882221038846]
Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model. We propose DiffPure that uses diffusion models for adversarial purification. Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
arXiv Detail & Related papers (2022-05-16T06:03:00Z)
Dual Spoof Disentanglement Generation for Face Anti-spoofing with Depth Uncertainty Learning [54.15303628138665]
Face anti-spoofing (FAS) plays a vital role in preventing face recognition systems from presentation attacks. Existing face anti-spoofing datasets lack diversity due to the insufficient identity and insignificant variance. We propose Dual Spoof Disentanglement Generation framework to tackle this challenge by "anti-spoofing via generation"
arXiv Detail & Related papers (2021-12-01T15:36:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.