Related papers: T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models

T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models

URL: http://arxiv.org/abs/2407.04215v2
Date: Wed, 17 Jul 2024 11:19:16 GMT
Title: T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models
Authors: Zhongqi Wang, Jie Zhang, Shiguang Shan, Xilin Chen,
Abstract summary: We propose a comprehensive defense method named T2IShield to detect, localize, and mitigate backdoor attacks. We find the "Assimilation Phenomenon" on the cross-attention maps caused by the backdoor trigger. For backdoor sample detection, T2IShield achieves a detection F1 score of 88.9$%$ with low computational cost.
Score: 70.03122709795122
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While text-to-image diffusion models demonstrate impressive generation capabilities, they also exhibit vulnerability to backdoor attacks, which involve the manipulation of model outputs through malicious triggers. In this paper, for the first time, we propose a comprehensive defense method named T2IShield to detect, localize, and mitigate such attacks. Specifically, we find the "Assimilation Phenomenon" on the cross-attention maps caused by the backdoor trigger. Based on this key insight, we propose two effective backdoor detection methods: Frobenius Norm Threshold Truncation and Covariance Discriminant Analysis. Besides, we introduce a binary-search approach to localize the trigger within a backdoor sample and assess the efficacy of existing concept editing methods in mitigating backdoor attacks. Empirical evaluations on two advanced backdoor attack scenarios show the effectiveness of our proposed defense method. For backdoor sample detection, T2IShield achieves a detection F1 score of 88.9$\%$ with low computational cost. Furthermore, T2IShield achieves a localization F1 score of 86.4$\%$ and invalidates 99$\%$ poisoned samples. Codes are released at https://github.com/Robin-WZQ/T2IShield.

Related papers

Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models [71.44858461725893]
Given a model fine-tuned by an untrusted third party, determining whether the model has been injected with a backdoor is a critical and challenging problem.<n>Existing detection methods usually rely on prior knowledge of training dataset, backdoor triggers and targets.<n>We introduce Assimilation Matters in DETection (AMDET), a novel model-level detection framework that operates without any such prior knowledge.
arXiv Detail & Related papers (2025-11-29T06:20:00Z)
Backdoor Collapse: Eliminating Unknown Threats via Known Backdoor Aggregation in Language Models [75.29749026964154]
Ourmethod reduces the average Attack Success Rate to 4.41% across multiple benchmarks.<n>Clean accuracy and utility are preserved within 0.5% of the original model.<n>The defense generalizes across different types of backdoors, confirming its robustness in practical deployment scenarios.
arXiv Detail & Related papers (2025-10-11T15:47:35Z)
Towards Invisible Backdoor Attack on Text-to-Image Diffusion Model [70.03122709795122]
Backdoor attacks targeting text-to-image diffusion models have advanced rapidly. Current backdoor samples often exhibit two key abnormalities compared to benign samples. We propose a novel Invisible Backdoor Attack (IBA) to enhance the stealthiness of backdoor samples.
arXiv Detail & Related papers (2025-03-22T10:41:46Z)
A Dual-Purpose Framework for Backdoor Defense and Backdoor Amplification in Diffusion Models [5.957580737396457]
PureDiffusion is a dual-purpose framework that simultaneously serves two contrasting roles: backdoor defense and backdoor attack amplification. For defense, we introduce two novel loss functions to invert backdoor triggers embedded in diffusion models. For attack amplification, we describe how our trigger inversion algorithm can be used to reinforce the original trigger embedded in the backdoored diffusion model.
arXiv Detail & Related papers (2025-02-26T11:01:43Z)
An Effective and Resilient Backdoor Attack Framework against Deep Neural Networks and Vision Transformers [22.77836113915616]
We propose a novel attention-based mask generation methodology that searches for the optimal trigger shape and location. We also introduce a Quality-of-Experience term into the loss function and carefully adjust the transparency value of the trigger. Our proposed backdoor attack framework also showcases robustness against state-of-the-art backdoor defenses.
arXiv Detail & Related papers (2024-12-09T02:03:27Z)
Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift [104.76588209308666]
This paper explores backdoor attacks in LVLM instruction tuning across mismatched training and testing domains. We introduce a new evaluation dimension, backdoor domain generalization, to assess attack robustness. We propose a multimodal attribution backdoor attack (MABA) that injects domain-agnostic triggers into critical areas.
arXiv Detail & Related papers (2024-06-27T02:31:03Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks. Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence. Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z)
Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification. CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z)
BDMMT: Backdoor Sample Detection for Language Models through Model Mutation Testing [14.88575793895578]
We propose a defense method based on deep model mutation testing. We first confirm the effectiveness of model mutation testing in detecting backdoor samples. We then systematically defend against three extensively studied backdoor attack levels.
arXiv Detail & Related papers (2023-01-25T05:24:46Z)
Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure. We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z)
Imperceptible Backdoor Attack: From Input Space to Feature Representation [24.82632240825927]
Backdoor attacks are rapidly emerging threats to deep neural networks (DNNs) In this paper, we analyze the drawbacks of existing attack approaches and propose a novel imperceptible backdoor attack. Our trigger only modifies less than 1% pixels of a benign image while the magnitude is 1.
arXiv Detail & Related papers (2022-05-06T13:02:26Z)
Adversarial Fine-tuning for Backdoor Defense: Connect Adversarial Examples to Triggered Samples [15.57457705138278]
We propose a new Adversarial Fine-Tuning (AFT) approach to erase backdoor triggers. AFT can effectively erase the backdoor triggers without obvious performance degradation on clean samples.
arXiv Detail & Related papers (2022-02-13T13:41:15Z)
Hidden Backdoors in Human-Centric Language Models [12.694861859949585]
We create covert and natural triggers for textual backdoor attacks. We deploy our hidden backdoors through two state-of-the-art trigger embedding methods. We demonstrate that the proposed hidden backdoors can be effective across three downstream security-critical NLP tasks.
arXiv Detail & Related papers (2021-05-01T04:41:00Z)
Black-box Detection of Backdoor Attacks with Limited Information and Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model. In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch. We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types. In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.