VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion
Models
- URL: http://arxiv.org/abs/2306.06874v5
- Date: Fri, 29 Dec 2023 10:44:40 GMT
- Title: VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion
Models
- Authors: Sheng-Yen Chou, Pin-Yu Chen, Tsung-Yi Ho
- Abstract summary: Diffusion Models (DMs) are state-of-the-art generative models that learn a reversible corruption process from iterative noise addition and denoising.
Recent studies have shown that basic unconditional DMs are vulnerable to backdoor injection.
This paper presents a unified backdoor attack framework to expand the current scope of backdoor analysis for DMs.
- Score: 69.20464255450788
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion Models (DMs) are state-of-the-art generative models that learn a
reversible corruption process from iterative noise addition and denoising. They
are the backbone of many generative AI applications, such as text-to-image
conditional generation. However, recent studies have shown that basic
unconditional DMs (e.g., DDPM and DDIM) are vulnerable to backdoor injection, a
type of output manipulation attack triggered by a maliciously embedded pattern
at model input. This paper presents a unified backdoor attack framework
(VillanDiffusion) to expand the current scope of backdoor analysis for DMs. Our
framework covers mainstream unconditional and conditional DMs (denoising-based
and score-based) and various training-free samplers for holistic evaluations.
Experiments show that our unified framework facilitates the backdoor analysis
of different DM configurations and provides new insights into caption-based
backdoor attacks on DMs. Our code is available on GitHub:
\url{https://github.com/IBM/villandiffusion}
Related papers
- BadCM: Invisible Backdoor Attack Against Cross-Modal Learning [110.37205323355695]
We introduce a novel bilateral backdoor to fill in the missing pieces of the puzzle in the cross-modal backdoor.
BadCM is the first invisible backdoor method deliberately designed for diverse cross-modal attacks within one unified framework.
arXiv Detail & Related papers (2024-10-03T03:51:53Z) - PureDiffusion: Using Backdoor to Counter Backdoor in Generative Diffusion Models [5.957580737396457]
Diffusion models (DMs) are advanced deep learning models that achieved state-of-the-art capability on a wide range of generative tasks.
Recent studies have shown their vulnerability regarding backdoor attacks, in which backdoored DMs consistently generate a designated result called backdoor target.
We introduce PureDiffusion, a novel backdoor defense framework that can efficiently detect backdoor attacks by inverting backdoor triggers embedded in DMs.
arXiv Detail & Related papers (2024-09-20T23:19:26Z) - Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor [0.24335447922683692]
We introduce a new type of backdoor attack that conceals itself within the underlying model architecture.
The add-on modules of model architecture layers can detect the presence of input trigger tokens and modify layer weights.
We conduct extensive experiments to evaluate our attack methods using two model architecture settings on five different large language datasets.
arXiv Detail & Related papers (2024-09-03T14:54:16Z) - BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning [71.60858267608306]
Medical foundation models are susceptible to backdoor attacks.
This work introduces a method to embed a backdoor into the medical foundation model during the prompt learning phase.
Our method, BAPLe, requires only a minimal subset of data to adjust the noise trigger and the text prompts for downstream tasks.
arXiv Detail & Related papers (2024-08-14T10:18:42Z) - Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models [3.134071086568745]
Diffusion models (DMs) are regarded as one of the most advanced generative models today.
Recent studies suggest that DMs are vulnerable to backdoor attacks.
This vulnerability poses substantial risks, including reputational damage to model owners.
We introduce Diff-Cleanse, a novel two-stage backdoor defense framework specifically designed for DMs.
arXiv Detail & Related papers (2024-07-31T03:54:41Z) - Elijah: Eliminating Backdoors Injected in Diffusion Models via
Distribution Shift [86.92048184556936]
We propose the first backdoor detection and removal framework for DMs.
We evaluate our framework Elijah on hundreds of DMs of 3 types including DDPM, NCSN and LDM.
Our approach can have close to 100% detection accuracy and reduce the backdoor effects to close to zero without significantly sacrificing the model utility.
arXiv Detail & Related papers (2023-11-27T23:58:56Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks.
Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence.
Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z) - Kallima: A Clean-label Framework for Textual Backdoor Attacks [25.332731545200808]
We propose the first clean-label framework Kallima for synthesizing mimesis-style backdoor samples.
We modify inputs belonging to the target class with adversarial perturbations, making the model rely more on the backdoor trigger.
arXiv Detail & Related papers (2022-06-03T21:44:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.