Defending Text-to-image Diffusion Models: Surprising Efficacy of Textual Perturbations Against Backdoor Attacks
- URL: http://arxiv.org/abs/2408.15721v1
- Date: Wed, 28 Aug 2024 11:36:43 GMT
- Title: Defending Text-to-image Diffusion Models: Surprising Efficacy of Textual Perturbations Against Backdoor Attacks
- Authors: Oscar Chew, Po-Yi Lu, Jayden Lin, Hsuan-Tien Lin,
- Abstract summary: We show that state-of-the-art backdoor attacks against text-to-image diffusion models can be effectively mitigated by a surprisingly simple defense strategy - textual perturbation.
Experiments show that textual perturbations are effective in defending against state-of-the-art backdoor attacks with minimal sacrifice to generation quality.
- Score: 7.777211995715721
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image diffusion models have been widely adopted in real-world applications due to their ability to generate realistic images from textual descriptions. However, recent studies have shown that these methods are vulnerable to backdoor attacks. Despite the significant threat posed by backdoor attacks on text-to-image diffusion models, countermeasures remain under-explored. In this paper, we address this research gap by demonstrating that state-of-the-art backdoor attacks against text-to-image diffusion models can be effectively mitigated by a surprisingly simple defense strategy - textual perturbation. Experiments show that textual perturbations are effective in defending against state-of-the-art backdoor attacks with minimal sacrifice to generation quality. We analyze the efficacy of textual perturbation from two angles: text embedding space and cross-attention maps. They further explain how backdoor attacks have compromised text-to-image diffusion models, providing insights for studying future attack and defense strategies. Our code is available at https://github.com/oscarchew/t2i-backdoor-defense.
Related papers
- DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing [93.45507533317405]
DiffusionGuard is a robust and effective defense method against unauthorized edits by diffusion-based image editing models.
We introduce a novel objective that generates adversarial noise targeting the early stage of the diffusion process.
We also introduce a mask-augmentation technique to enhance robustness against various masks during test time.
arXiv Detail & Related papers (2024-10-08T05:19:19Z) - Adversarial Attacks and Defenses on Text-to-Image Diffusion Models: A Survey [14.430120388340457]
A text-to-image diffusion model, Stable Diffusion, amassed more than 10 million users within just two months of its release.
We provide a review of the literature on adversarial attacks and defenses targeting text-to-image diffusion models.
We then present a detailed analysis of current defense methods that improve model robustness and safety.
arXiv Detail & Related papers (2024-07-10T13:50:31Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts [38.63253101205306]
Previous studies have successfully demonstrated that manipulated prompts can elicit text-to-image models to generate unsafe images.
We propose two poisoning attacks: a basic attack and a utility-preserving attack.
Our findings underscore the potential risks of adopting text-to-image models in real-world scenarios.
arXiv Detail & Related papers (2023-10-25T13:10:44Z) - Attention-Enhancing Backdoor Attacks Against BERT-based Models [54.070555070629105]
Investigating the strategies of backdoor attacks will help to understand the model's vulnerability.
We propose a novel Trojan Attention Loss (TAL) which enhances the Trojan behavior by directly manipulating the attention patterns.
arXiv Detail & Related papers (2023-10-23T01:24:56Z) - Personalization as a Shortcut for Few-Shot Backdoor Attack against
Text-to-Image Diffusion Models [23.695414399663235]
This paper investigates the potential vulnerability of text-to-image (T2I) diffusion models to backdoor attacks via personalization.
Our study focuses on a zero-day backdoor vulnerability prevalent in two families of personalization methods, epitomized by Textual Inversion and DreamBooth.
By studying the prompt processing of Textual Inversion and DreamBooth, we have devised dedicated backdoor attacks according to the different ways of dealing with unseen tokens.
arXiv Detail & Related papers (2023-05-18T04:28:47Z) - Text-to-Image Diffusion Models can be Easily Backdoored through
Multimodal Data Poisoning [29.945013694922924]
We propose BadT2I, a general multimodal backdoor attack framework that tampers with image synthesis in diverse semantic levels.
Specifically, we perform backdoor attacks on three levels of the vision semantics: Pixel-Backdoor, Object-Backdoor and Style-Backdoor.
By utilizing a regularization loss, our methods efficiently inject backdoors into a large-scale text-to-image diffusion model.
arXiv Detail & Related papers (2023-05-07T03:21:28Z) - Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text
Style Transfer [49.67011295450601]
We make the first attempt to conduct adversarial and backdoor attacks based on text style transfer.
Experimental results show that popular NLP models are vulnerable to both adversarial and backdoor attacks based on text style transfer.
arXiv Detail & Related papers (2021-10-14T03:54:16Z) - Backdoor Attack in the Physical World [49.64799477792172]
Backdoor attack intends to inject hidden backdoor into the deep neural networks (DNNs)
Most existing backdoor attacks adopted the setting of static trigger, $i.e.,$ triggers across the training and testing images.
We demonstrate that this attack paradigm is vulnerable when the trigger in testing images is not consistent with the one used for training.
arXiv Detail & Related papers (2021-04-06T08:37:33Z) - ONION: A Simple and Effective Defense Against Textual Backdoor Attacks [91.83014758036575]
Backdoor attacks are a kind of emergent training-time threat to deep neural networks (DNNs)
In this paper, we propose a simple and effective textual backdoor defense named ONION.
Experiments demonstrate the effectiveness of our model in defending BiLSTM and BERT against five different backdoor attacks.
arXiv Detail & Related papers (2020-11-20T12:17:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.