Dynamic Attention Analysis for Backdoor Detection in Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2504.20518v1
- Date: Tue, 29 Apr 2025 07:59:35 GMT
- Title: Dynamic Attention Analysis for Backdoor Detection in Text-to-Image Diffusion Models
- Authors: Zhongqi Wang, Jie Zhang, Shiguang Shan, Xilin Chen,
- Abstract summary: Previous backdoor detection methods primarily focus on the static features of backdoor samples.<n>This study introduces a novel backdoor detection perspective named Dynamic Attention Analysis (DAA), showing that these dynamic characteristics serve as better indicators for backdoor detection.<n>Our approach significantly surpasses existing detection methods, achieving an average F1 Score of 79.49% and an AUC of 87.67%.
- Score: 70.03122709795122
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent studies have revealed that text-to-image diffusion models are vulnerable to backdoor attacks, where attackers implant stealthy textual triggers to manipulate model outputs. Previous backdoor detection methods primarily focus on the static features of backdoor samples. However, a vital property of diffusion models is their inherent dynamism. This study introduces a novel backdoor detection perspective named Dynamic Attention Analysis (DAA), showing that these dynamic characteristics serve as better indicators for backdoor detection. Specifically, by examining the dynamic evolution of cross-attention maps, we observe that backdoor samples exhibit distinct feature evolution patterns at the $<$EOS$>$ token compared to benign samples. To quantify these dynamic anomalies, we first introduce DAA-I, which treats the tokens' attention maps as spatially independent and measures dynamic feature using the Frobenius norm. Furthermore, to better capture the interactions between attention maps and refine the feature, we propose a dynamical system-based approach, referred to as DAA-S. This model formulates the spatial correlations among attention maps using a graph-based state equation and we theoretically analyze the global asymptotic stability of this method. Extensive experiments across five representative backdoor attack scenarios demonstrate that our approach significantly surpasses existing detection methods, achieving an average F1 Score of 79.49% and an AUC of 87.67%. The code is available at https://github.com/Robin-WZQ/DAA.
Related papers
- Towards Invisible Backdoor Attack on Text-to-Image Diffusion Model [70.03122709795122]
Backdoor attacks targeting text-to-image diffusion models have advanced rapidly.<n>Current backdoor samples often exhibit two key abnormalities compared to benign samples.<n>We propose a novel Invisible Backdoor Attack (IBA) to enhance the stealthiness of backdoor samples.
arXiv Detail & Related papers (2025-03-22T10:41:46Z) - Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift [104.76588209308666]
This paper explores backdoor attacks in LVLM instruction tuning across mismatched training and testing domains.<n>We introduce a new evaluation dimension, backdoor domain generalization, to assess attack robustness.<n>We propose a multimodal attribution backdoor attack (MABA) that injects domain-agnostic triggers into critical areas.
arXiv Detail & Related papers (2024-06-27T02:31:03Z) - UFID: A Unified Framework for Input-level Backdoor Detection on Diffusion Models [19.46962670935554]
Diffusion models are vulnerable to backdoor attacks.<n>We propose a black-box input-level backdoor detection framework on diffusion models, called UFID.<n>Our method achieves superb performance on detection effectiveness and run-time efficiency.
arXiv Detail & Related papers (2024-04-01T13:21:05Z) - Detecting Anomalies in Dynamic Graphs via Memory enhanced Normality [39.476378833827184]
Anomaly detection in dynamic graphs presents a significant challenge due to the temporal evolution of graph structures and attributes.
We introduce a novel spatial- temporal memories-enhanced graph autoencoder (STRIPE)
STRIPE significantly outperforms existing methods with 5.8% improvement in AUC scores and 4.62X faster in training time.
arXiv Detail & Related papers (2024-03-14T02:26:10Z) - Model X-ray:Detecting Backdoored Models via Decision Boundary [62.675297418960355]
Backdoor attacks pose a significant security vulnerability for deep neural networks (DNNs)
We propose Model X-ray, a novel backdoor detection approach based on the analysis of illustrated two-dimensional (2D) decision boundaries.
Our approach includes two strategies focused on the decision areas dominated by clean samples and the concentration of label distribution.
arXiv Detail & Related papers (2024-02-27T12:42:07Z) - DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models [23.502100653704446]
Some pioneering works have shown the vulnerability of the diffusion model against backdoor attacks.
In this paper, for the first time, we explore the detectability of the poisoned noise input for the backdoored diffusion models.
We propose a low-cost trigger detection mechanism that can effectively identify the poisoned input noise.
We then take a further step to study the same problem from the attack side, proposing a backdoor attack strategy that can learn the unnoticeable trigger.
arXiv Detail & Related papers (2024-02-05T05:46:31Z) - Robust Backdoor Detection for Deep Learning via Topological Evolution Dynamics [18.28911572993562]
A backdoor attack in deep learning inserts a hidden backdoor in the model to trigger malicious behavior upon specific input patterns.
We show that this assumption has a severe limitation by introducing a novel SSDT (Source-Specific and Dynamic-Triggers) backdoor.
We propose TED (Topological Evolution Dynamics) as a model-agnostic basis for robust backdoor detection.
arXiv Detail & Related papers (2023-12-05T11:29:12Z) - Black-box Adversarial Attacks against Dense Retrieval Models: A
Multi-view Contrastive Learning Method [115.29382166356478]
We introduce the adversarial retrieval attack (AREA) task.
It is meant to trick DR models into retrieving a target document that is outside the initial set of candidate documents retrieved by the DR model.
We find that the promising results that have previously been reported on attacking NRMs, do not generalize to DR models.
We propose to formalize attacks on DR models as a contrastive learning problem in a multi-view representation space.
arXiv Detail & Related papers (2023-08-19T00:24:59Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Novelty Detection Through Model-Based Characterization of Neural
Networks [19.191613437266184]
We propose a model-based characterization of neural networks to detect novel input types and conditions.
We validate our approach using four image recognition datasets including MNIST, Fashion-MNIST, CIFAR-10, and CURE-TSR.
arXiv Detail & Related papers (2020-08-13T20:03:25Z) - Exposing Backdoors in Robust Machine Learning Models [0.5672132510411463]
We show that adversarially robust models are susceptible to backdoor attacks.
backdoors are reflected in the feature representation of such models.
This observation is leveraged to detect backdoor-infected models via a detection technique called AEGIS.
arXiv Detail & Related papers (2020-02-25T04:45:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.