Adversarial Attacks and Defenses on Text-to-Image Diffusion Models: A Survey
- URL: http://arxiv.org/abs/2407.15861v2
- Date: Fri, 13 Sep 2024 04:11:30 GMT
- Title: Adversarial Attacks and Defenses on Text-to-Image Diffusion Models: A Survey
- Authors: Chenyu Zhang, Mingwang Hu, Wenhui Li, Lanjun Wang,
- Abstract summary: A text-to-image diffusion model, Stable Diffusion, amassed more than 10 million users within just two months of its release.
We provide a review of the literature on adversarial attacks and defenses targeting text-to-image diffusion models.
We then present a detailed analysis of current defense methods that improve model robustness and safety.
- Score: 14.430120388340457
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the text-to-image diffusion model has gained considerable attention from the community due to its exceptional image generation capability. A representative model, Stable Diffusion, amassed more than 10 million users within just two months of its release. This surge in popularity has facilitated studies on the robustness and safety of the model, leading to the proposal of various adversarial attack methods. Simultaneously, there has been a marked increase in research focused on defense methods to improve the robustness and safety of these models. In this survey, we provide a comprehensive review of the literature on adversarial attacks and defenses targeting text-to-image diffusion models. We begin with an overview of text-to-image diffusion models, followed by an introduction to a taxonomy of adversarial attacks and an in-depth review of existing attack methods. We then present a detailed analysis of current defense methods that improve model robustness and safety. Finally, we discuss ongoing challenges and explore promising future research directions. For a complete list of the adversarial attack and defense methods covered in this survey, please refer to our curated repository at https://github.com/datar001/Awesome-AD-on-T2IDM.
Related papers
- DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing [93.45507533317405]
DiffusionGuard is a robust and effective defense method against unauthorized edits by diffusion-based image editing models.
We introduce a novel objective that generates adversarial noise targeting the early stage of the diffusion process.
We also introduce a mask-augmentation technique to enhance robustness against various masks during test time.
arXiv Detail & Related papers (2024-10-08T05:19:19Z) - Defending Text-to-image Diffusion Models: Surprising Efficacy of Textual Perturbations Against Backdoor Attacks [7.777211995715721]
We show that state-of-the-art backdoor attacks against text-to-image diffusion models can be effectively mitigated by a surprisingly simple defense strategy - textual perturbation.
Experiments show that textual perturbations are effective in defending against state-of-the-art backdoor attacks with minimal sacrifice to generation quality.
arXiv Detail & Related papers (2024-08-28T11:36:43Z) - Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models [65.30406788716104]
This work investigates the vulnerabilities of security-enhancing diffusion models.
We demonstrate that these models are highly susceptible to DIFF2, a simple yet effective backdoor attack.
Case studies show that DIFF2 can significantly reduce both post-purification and certified accuracy across benchmark datasets and models.
arXiv Detail & Related papers (2024-06-14T02:39:43Z) - MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - A Survey on Vulnerability of Federated Learning: A Learning Algorithm
Perspective [8.941193384980147]
We focus on threat models targeting the learning process of FL systems.
Defense strategies have evolved from using a singular metric to excluding malicious clients.
Recent endeavors subtly alter the least significant weights in local models to bypass defense measures.
arXiv Detail & Related papers (2023-11-27T18:32:08Z) - Data Forensics in Diffusion Models: A Systematic Analysis of Membership
Privacy [62.16582309504159]
We develop a systematic analysis of membership inference attacks on diffusion models and propose novel attack methods tailored to each attack scenario.
Our approach exploits easily obtainable quantities and is highly effective, achieving near-perfect attack performance (>0.9 AUCROC) in realistic scenarios.
arXiv Detail & Related papers (2023-02-15T17:37:49Z) - ADDMU: Detection of Far-Boundary Adversarial Examples with Data and
Model Uncertainty Estimation [125.52743832477404]
Adversarial Examples Detection (AED) is a crucial defense technique against adversarial attacks.
We propose a new technique, textbfADDMU, which combines two types of uncertainty estimation for both regular and FB adversarial example detection.
Our new method outperforms previous methods by 3.6 and 6.0 emphAUC points under each scenario.
arXiv Detail & Related papers (2022-10-22T09:11:12Z) - Threat Model-Agnostic Adversarial Defense using Diffusion Models [14.603209216642034]
Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks.
Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks.
arXiv Detail & Related papers (2022-07-17T06:50:48Z) - Towards Adversarial Patch Analysis and Certified Defense against Crowd
Counting [61.99564267735242]
Crowd counting has drawn much attention due to its importance in safety-critical surveillance systems.
Recent studies have demonstrated that deep neural network (DNN) methods are vulnerable to adversarial attacks.
We propose a robust attack strategy called Adversarial Patch Attack with Momentum to evaluate the robustness of crowd counting models.
arXiv Detail & Related papers (2021-04-22T05:10:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.