Attacks and Defenses for Generative Diffusion Models: A Comprehensive Survey
- URL: http://arxiv.org/abs/2408.03400v1
- Date: Tue, 6 Aug 2024 18:52:17 GMT
- Title: Attacks and Defenses for Generative Diffusion Models: A Comprehensive Survey
- Authors: Vu Tuan Truong, Luan Ba Dang, Long Bao Le,
- Abstract summary: Diffusion models (DMs) have achieved state-of-the-art performance on various generative tasks.
Recent studies have shown that DMs are prone to a wide range of attacks.
- Score: 5.300811350105823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models (DMs) have achieved state-of-the-art performance on various generative tasks such as image synthesis, text-to-image, and text-guided image-to-image generation. However, the more powerful the DMs, the more harmful they potentially are. Recent studies have shown that DMs are prone to a wide range of attacks, including adversarial attacks, membership inference, backdoor injection, and various multi-modal threats. Since numerous pre-trained DMs are published widely on the Internet, potential threats from these attacks are especially detrimental to the society, making DM-related security a worth investigating topic. Therefore, in this paper, we conduct a comprehensive survey on the security aspect of DMs, focusing on various attack and defense methods for DMs. First, we present crucial knowledge of DMs with five main types of DMs, including denoising diffusion probabilistic models, denoising diffusion implicit models, noise conditioned score networks, stochastic differential equations, and multi-modal conditional DMs. We further survey a variety of recent studies investigating different types of attacks that exploit the vulnerabilities of DMs. Then, we thoroughly review potential countermeasures to mitigate each of the presented threats. Finally, we discuss open challenges of DM-related security and envision certain research directions for this topic.
Related papers
- Slight Corruption in Pre-training Data Makes Better Diffusion Models [71.90034201302397]
Diffusion models (DMs) have shown remarkable capabilities in generating high-quality images, audios, and videos.
DMs benefit significantly from extensive pre-training on large-scale datasets.
However, pre-training datasets often contain corrupted pairs where conditions do not accurately describe the data.
This paper presents the first comprehensive study on the impact of such corruption in pre-training data of DMs.
arXiv Detail & Related papers (2024-05-30T21:35:48Z) - Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked.
We propose an attack-agnostic defense method named Meta Invariance Defense (MID)
We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z) - UnlearnCanvas: Stylized Image Dataset for Enhanced Machine Unlearning Evaluation in Diffusion Models [31.48739583108113]
diffusion models (DMs) have demonstrated unprecedented capabilities in text-to-image generation and are widely used in diverse applications.
They have also raised significant societal concerns, such as the generation of harmful content and copyright disputes.
Machine unlearning (MU) has emerged as a promising solution, capable of removing undesired generative capabilities from DMs.
arXiv Detail & Related papers (2024-02-19T05:25:53Z) - From Trojan Horses to Castle Walls: Unveiling Bilateral Data Poisoning Effects in Diffusion Models [19.140908259968302]
We investigate whether BadNets-like data poisoning methods can directly degrade the generation by DMs.
We show that a BadNets-like data poisoning attack remains effective in DMs for producing incorrect images.
Poisoned DMs exhibit an increased ratio of triggers, a phenomenon we refer to as trigger amplification'
arXiv Detail & Related papers (2023-11-04T11:00:31Z) - To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now [22.75295925610285]
diffusion models (DMs) have revolutionized the generation of realistic and complex images.
DMs also introduce potential safety hazards, such as producing harmful content and infringing data copyrights.
Despite the development of safety-driven unlearning techniques, doubts about their efficacy persist.
arXiv Detail & Related papers (2023-10-18T10:36:34Z) - Memory in Plain Sight: Surveying the Uncanny Resemblances of Associative Memories and Diffusion Models [65.08133391009838]
generative process of Diffusion Models (DMs) has recently set state-of-the-art on many AI generation benchmarks.
We introduce a novel perspective to describe DMs using the mathematical language of memory retrieval from the field of energy-based Associative Memories (AMs)
We present a growing body of evidence that records DMs exhibiting empirical behavior we would expect from AMs, and conclude by discussing research opportunities that are revealed by understanding DMs as a form of energy-based memory.
arXiv Detail & Related papers (2023-09-28T17:57:09Z) - Defending Pre-trained Language Models as Few-shot Learners against
Backdoor Attacks [72.03945355787776]
We advocate MDP, a lightweight, pluggable, and effective defense for PLMs as few-shot learners.
We show analytically that MDP creates an interesting dilemma for the attacker to choose between attack effectiveness and detection evasiveness.
arXiv Detail & Related papers (2023-09-23T04:41:55Z) - Evaluating the Robustness of Text-to-image Diffusion Models against
Real-world Attacks [22.651626059348356]
Text-to-image (T2I) diffusion models (DMs) have shown promise in generating high-quality images from textual descriptions.
One fundamental question is whether existing T2I DMs are robust against variations over input texts.
This work provides the first robustness evaluation of T2I DMs against real-world attacks.
arXiv Detail & Related papers (2023-06-16T00:43:35Z) - VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion
Models [69.20464255450788]
Diffusion Models (DMs) are state-of-the-art generative models that learn a reversible corruption process from iterative noise addition and denoising.
Recent studies have shown that basic unconditional DMs are vulnerable to backdoor injection.
This paper presents a unified backdoor attack framework to expand the current scope of backdoor analysis for DMs.
arXiv Detail & Related papers (2023-06-12T05:14:13Z) - Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack [53.032801921915436]
Human Activity Recognition (HAR) has been employed in a wide range of applications, e.g. self-driving cars.
Recently, the robustness of skeleton-based HAR methods have been questioned due to their vulnerability to adversarial attacks.
We show such threats exist, even when the attacker only has access to the input/output of the model.
We propose the very first black-box adversarial attack approach in skeleton-based HAR called BASAR.
arXiv Detail & Related papers (2022-11-21T09:51:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.