Practical, Generalizable and Robust Backdoor Attacks on Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2508.01605v1
- Date: Sun, 03 Aug 2025 05:42:20 GMT
- Title: Practical, Generalizable and Robust Backdoor Attacks on Text-to-Image Diffusion Models
- Authors: Haoran Dai, Jiawen Wang, Ruo Yang, Manali Sharma, Zhonghao Liao, Yuan Hong, Binghui Wang,
- Abstract summary: Text-to-image diffusion models (T2I DMs) have achieved remarkable success in generating high-quality and diverse images from text prompts.<n>Recent studies have revealed their vulnerability to backdoor attacks.<n>We propose a novel backdoor attack framework that achieves three key properties.
- Score: 20.856910241657854
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Text-to-image diffusion models (T2I DMs) have achieved remarkable success in generating high-quality and diverse images from text prompts, yet recent studies have revealed their vulnerability to backdoor attacks. Existing attack methods suffer from critical limitations: 1) they rely on unnatural adversarial prompts that lack human readability and require massive poisoned data; 2) their effectiveness is typically restricted to specific models, lacking generalizability; and 3) they can be mitigated by recent backdoor defenses. To overcome these challenges, we propose a novel backdoor attack framework that achieves three key properties: 1) \emph{Practicality}: Our attack requires only a few stealthy backdoor samples to generate arbitrary attacker-chosen target images, as well as ensuring high-quality image generation in benign scenarios. 2) \emph{Generalizability:} The attack is applicable across multiple T2I DMs without requiring model-specific redesign. 3) \emph{Robustness:} The attack remains effective against existing backdoor defenses and adaptive defenses. Our extensive experimental results on multiple T2I DMs demonstrate that with only 10 carefully crafted backdoored samples, our attack method achieves $>$90\% attack success rate with negligible degradation in benign image generation quality. We also conduct human evaluation to validate our attack effectiveness. Furthermore, recent backdoor detection and mitigation methods, as well as adaptive defense tailored to our attack are not sufficiently effective, highlighting the pressing need for more robust defense mechanisms against the proposed attack.
Related papers
- Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models [8.672029086609884]
Diffusion Models (DMs) are vulnerable to backdoor attacks.<n>Gungnir is a novel method that enables attackers to activate the backdoor in DMs through style triggers within input images.<n>Our technique generates trigger-embedded images that are perceptually indistinguishable from clean images.
arXiv Detail & Related papers (2025-02-28T02:08:26Z) - Data Free Backdoor Attacks [83.10379074100453]
DFBA is a retraining-free and data-free backdoor attack without changing the model architecture.<n>We verify that our injected backdoor is provably undetectable and unchosen by various state-of-the-art defenses.<n>Our evaluation on multiple datasets demonstrates that our injected backdoor: 1) incurs negligible classification loss, 2) achieves 100% attack success rates, and 3) bypasses six existing state-of-the-art defenses.
arXiv Detail & Related papers (2024-12-09T05:30:25Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - An Invisible Backdoor Attack Based On Semantic Feature [0.0]
Backdoor attacks have severely threatened deep neural network (DNN) models in the past several years.
We propose a novel backdoor attack, making imperceptible changes.
We evaluate our attack on three prominent image classification datasets.
arXiv Detail & Related papers (2024-05-19T13:50:40Z) - Elijah: Eliminating Backdoors Injected in Diffusion Models via
Distribution Shift [86.92048184556936]
We propose the first backdoor detection and removal framework for DMs.
We evaluate our framework Elijah on hundreds of DMs of 3 types including DDPM, NCSN and LDM.
Our approach can have close to 100% detection accuracy and reduce the backdoor effects to close to zero without significantly sacrificing the model utility.
arXiv Detail & Related papers (2023-11-27T23:58:56Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Evaluating the Robustness of Text-to-image Diffusion Models against
Real-world Attacks [22.651626059348356]
Text-to-image (T2I) diffusion models (DMs) have shown promise in generating high-quality images from textual descriptions.
One fundamental question is whether existing T2I DMs are robust against variations over input texts.
This work provides the first robustness evaluation of T2I DMs against real-world attacks.
arXiv Detail & Related papers (2023-06-16T00:43:35Z) - Personalization as a Shortcut for Few-Shot Backdoor Attack against
Text-to-Image Diffusion Models [23.695414399663235]
This paper investigates the potential vulnerability of text-to-image (T2I) diffusion models to backdoor attacks via personalization.
Our study focuses on a zero-day backdoor vulnerability prevalent in two families of personalization methods, epitomized by Textual Inversion and DreamBooth.
By studying the prompt processing of Textual Inversion and DreamBooth, we have devised dedicated backdoor attacks according to the different ways of dealing with unseen tokens.
arXiv Detail & Related papers (2023-05-18T04:28:47Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Natural Backdoor Attack on Text Data [15.35163515187413]
In this paper, we propose the textitbackdoor attacks on NLP models.
We exploit the various attack strategies to generate trigger on text data and investigate different types of triggers based on modification scope, human recognition, and special cases.
The results show the excellent performance of with 100% backdoor attacks success rate and sacrificing of 0.83% on the text classification task.
arXiv Detail & Related papers (2020-06-29T16:40:14Z) - BadNL: Backdoor Attacks against NLP Models with Semantic-preserving
Improvements [33.309299864983295]
We propose BadNL, a general NLP backdoor attack framework including novel attack methods.
Our attacks achieve an almost perfect attack success rate with a negligible effect on the original model's utility.
arXiv Detail & Related papers (2020-06-01T16:17:14Z) - On Certifying Robustness against Backdoor Attacks via Randomized
Smoothing [74.79764677396773]
We study the feasibility and effectiveness of certifying robustness against backdoor attacks using a recent technique called randomized smoothing.
Our results show the theoretical feasibility of using randomized smoothing to certify robustness against backdoor attacks.
Existing randomized smoothing methods have limited effectiveness at defending against backdoor attacks.
arXiv Detail & Related papers (2020-02-26T19:15:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.