Text-to-Image Diffusion Models can be Easily Backdoored through
Multimodal Data Poisoning
- URL: http://arxiv.org/abs/2305.04175v2
- Date: Sun, 22 Oct 2023 16:50:21 GMT
- Title: Text-to-Image Diffusion Models can be Easily Backdoored through
Multimodal Data Poisoning
- Authors: Shengfang Zhai, Yinpeng Dong, Qingni Shen, Shi Pu, Yuejian Fang and
Hang Su
- Abstract summary: We propose BadT2I, a general multimodal backdoor attack framework that tampers with image synthesis in diverse semantic levels.
Specifically, we perform backdoor attacks on three levels of the vision semantics: Pixel-Backdoor, Object-Backdoor and Style-Backdoor.
By utilizing a regularization loss, our methods efficiently inject backdoors into a large-scale text-to-image diffusion model.
- Score: 29.945013694922924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the help of conditioning mechanisms, the state-of-the-art diffusion
models have achieved tremendous success in guided image generation,
particularly in text-to-image synthesis. To gain a better understanding of the
training process and potential risks of text-to-image synthesis, we perform a
systematic investigation of backdoor attack on text-to-image diffusion models
and propose BadT2I, a general multimodal backdoor attack framework that tampers
with image synthesis in diverse semantic levels. Specifically, we perform
backdoor attacks on three levels of the vision semantics: Pixel-Backdoor,
Object-Backdoor and Style-Backdoor. By utilizing a regularization loss, our
methods efficiently inject backdoors into a large-scale text-to-image diffusion
model while preserving its utility with benign inputs. We conduct empirical
experiments on Stable Diffusion, the widely-used text-to-image diffusion model,
demonstrating that the large-scale diffusion model can be easily backdoored
within a few fine-tuning steps. We conduct additional experiments to explore
the impact of different types of textual triggers, as well as the backdoor
persistence during further training, providing insights for the development of
backdoor defense methods. Besides, our investigation may contribute to the
copyright protection of text-to-image models in the future.
Related papers
- MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework.
Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss.
We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z) - Defending Text-to-image Diffusion Models: Surprising Efficacy of Textual Perturbations Against Backdoor Attacks [7.777211995715721]
We show that state-of-the-art backdoor attacks against text-to-image diffusion models can be effectively mitigated by a surprisingly simple defense strategy - textual perturbation.
Experiments show that textual perturbations are effective in defending against state-of-the-art backdoor attacks with minimal sacrifice to generation quality.
arXiv Detail & Related papers (2024-08-28T11:36:43Z) - Unveiling Structural Memorization: Structural Membership Inference Attack for Text-to-Image Diffusion Models [17.946671657675022]
Member Inference Attack (MIA) is proposed to serve as a tool for privacy protection.
We propose a simple yet effective MIA method tailored for text-to-image diffusion models.
Our approach not only achieves state-of-the-art performance but also demonstrates remarkable robustness against various distortions.
arXiv Detail & Related papers (2024-07-18T08:07:28Z) - Invisible Backdoor Attacks on Diffusion Models [22.08671395877427]
Recent research has brought to light the vulnerability of diffusion models to backdoor attacks.
We present an innovative framework designed to acquire invisible triggers, enhancing the stealthiness and resilience of inserted backdoors.
arXiv Detail & Related papers (2024-06-02T17:43:19Z) - Could It Be Generated? Towards Practical Analysis of Memorization in Text-To-Image Diffusion Models [39.607005089747936]
We perform practical analysis of memorization in text-to-image diffusion models.
We identify three necessary conditions of memorization, respectively similarity, existence and probability.
We then reveal the correlation between the model's prediction error and image replication.
arXiv Detail & Related papers (2024-05-09T15:32:00Z) - Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models [58.065255696601604]
We use compositional property of diffusion models, which allows to leverage multiple prompts in a single image generation.
We argue that it is essential to consider all possible approaches to image generation with diffusion models that can be employed by an adversary.
arXiv Detail & Related papers (2024-04-21T16:35:16Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Adversarial Prompt Tuning for Vision-Language Models [86.5543597406173]
Adversarial Prompt Tuning (AdvPT) is a technique to enhance the adversarial robustness of image encoders in Vision-Language Models (VLMs)
We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing image-processing-based defense techniques.
arXiv Detail & Related papers (2023-11-19T07:47:43Z) - Personalization as a Shortcut for Few-Shot Backdoor Attack against
Text-to-Image Diffusion Models [23.695414399663235]
This paper investigates the potential vulnerability of text-to-image (T2I) diffusion models to backdoor attacks via personalization.
Our study focuses on a zero-day backdoor vulnerability prevalent in two families of personalization methods, epitomized by Textual Inversion and DreamBooth.
By studying the prompt processing of Textual Inversion and DreamBooth, we have devised dedicated backdoor attacks according to the different ways of dealing with unseen tokens.
arXiv Detail & Related papers (2023-05-18T04:28:47Z) - Data Forensics in Diffusion Models: A Systematic Analysis of Membership
Privacy [62.16582309504159]
We develop a systematic analysis of membership inference attacks on diffusion models and propose novel attack methods tailored to each attack scenario.
Our approach exploits easily obtainable quantities and is highly effective, achieving near-perfect attack performance (>0.9 AUCROC) in realistic scenarios.
arXiv Detail & Related papers (2023-02-15T17:37:49Z) - How to Backdoor Diffusion Models? [74.43215520371506]
This paper presents the first study on the robustness of diffusion models against backdoor attacks.
We propose BadDiffusion, a novel attack framework that engineers compromised diffusion processes during model training for backdoor implantation.
Our results call attention to potential risks and possible misuse of diffusion models.
arXiv Detail & Related papers (2022-12-11T03:44:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.