VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion
Models
- URL: http://arxiv.org/abs/2306.06874v5
- Date: Fri, 29 Dec 2023 10:44:40 GMT
- Title: VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion
Models
- Authors: Sheng-Yen Chou, Pin-Yu Chen, Tsung-Yi Ho
- Abstract summary: Diffusion Models (DMs) are state-of-the-art generative models that learn a reversible corruption process from iterative noise addition and denoising.
Recent studies have shown that basic unconditional DMs are vulnerable to backdoor injection.
This paper presents a unified backdoor attack framework to expand the current scope of backdoor analysis for DMs.
- Score: 69.20464255450788
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion Models (DMs) are state-of-the-art generative models that learn a
reversible corruption process from iterative noise addition and denoising. They
are the backbone of many generative AI applications, such as text-to-image
conditional generation. However, recent studies have shown that basic
unconditional DMs (e.g., DDPM and DDIM) are vulnerable to backdoor injection, a
type of output manipulation attack triggered by a maliciously embedded pattern
at model input. This paper presents a unified backdoor attack framework
(VillanDiffusion) to expand the current scope of backdoor analysis for DMs. Our
framework covers mainstream unconditional and conditional DMs (denoising-based
and score-based) and various training-free samplers for holistic evaluations.
Experiments show that our unified framework facilitates the backdoor analysis
of different DM configurations and provides new insights into caption-based
backdoor attacks on DMs. Our code is available on GitHub:
\url{https://github.com/IBM/villandiffusion}
Related papers
- Elijah: Eliminating Backdoors Injected in Diffusion Models via
Distribution Shift [86.92048184556936]
We propose the first backdoor detection and removal framework for DMs.
We evaluate our framework Elijah on hundreds of DMs of 3 types including DDPM, NCSN and LDM.
Our approach can have close to 100% detection accuracy and reduce the backdoor effects to close to zero without significantly sacrificing the model utility.
arXiv Detail & Related papers (2023-11-27T23:58:56Z) - BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models [54.19289900203071]
The rise in popularity of text-to-image generative artificial intelligence has attracted widespread public interest.
We demonstrate that this technology can be attacked to generate content that subtly manipulates its users.
We propose a Backdoor Attack on text-to-image Generative Models (BAGM)
Our attack is the first to target three popular text-to-image generative models across three stages of the generative process.
arXiv Detail & Related papers (2023-07-31T08:34:24Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks.
Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence.
Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - Adversarial Example Does Good: Preventing Painting Imitation from
Diffusion Models via Adversarial Examples [32.701307512642835]
Diffusion Models (DMs) boost a wave in AI for Art yet raise new copyright concerns.
In this paper, we propose to utilize adversarial examples for DMs to protect human-created artworks.
Our method can be a powerful tool for human artists to protect their copyright against infringers equipped with DM-based AI-for-Art applications.
arXiv Detail & Related papers (2023-02-09T11:36:39Z) - BDMMT: Backdoor Sample Detection for Language Models through Model
Mutation Testing [14.88575793895578]
We propose a defense method based on deep model mutation testing.
We first confirm the effectiveness of model mutation testing in detecting backdoor samples.
We then systematically defend against three extensively studied backdoor attack levels.
arXiv Detail & Related papers (2023-01-25T05:24:46Z) - Kallima: A Clean-label Framework for Textual Backdoor Attacks [25.332731545200808]
We propose the first clean-label framework Kallima for synthesizing mimesis-style backdoor samples.
We modify inputs belonging to the target class with adversarial perturbations, making the model rely more on the backdoor trigger.
arXiv Detail & Related papers (2022-06-03T21:44:43Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.