Related papers: VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models

VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models

URL: http://arxiv.org/abs/2306.06874v5
Date: Fri, 29 Dec 2023 10:44:40 GMT
Title: VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models
Authors: Sheng-Yen Chou, Pin-Yu Chen, Tsung-Yi Ho
Abstract summary: Diffusion Models (DMs) are state-of-the-art generative models that learn a reversible corruption process from iterative noise addition and denoising. Recent studies have shown that basic unconditional DMs are vulnerable to backdoor injection. This paper presents a unified backdoor attack framework to expand the current scope of backdoor analysis for DMs.
Score: 69.20464255450788
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion Models (DMs) are state-of-the-art generative models that learn a reversible corruption process from iterative noise addition and denoising. They are the backbone of many generative AI applications, such as text-to-image conditional generation. However, recent studies have shown that basic unconditional DMs (e.g., DDPM and DDIM) are vulnerable to backdoor injection, a type of output manipulation attack triggered by a maliciously embedded pattern at model input. This paper presents a unified backdoor attack framework (VillanDiffusion) to expand the current scope of backdoor analysis for DMs. Our framework covers mainstream unconditional and conditional DMs (denoising-based and score-based) and various training-free samplers for holistic evaluations. Experiments show that our unified framework facilitates the backdoor analysis of different DM configurations and provides new insights into caption-based backdoor attacks on DMs. Our code is available on GitHub: \url{https://github.com/IBM/villandiffusion}

Related papers

Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models [8.672029086609884]
Diffusion Models (DMs) are vulnerable to backdoor attacks. Gungnir is a novel method that enables attackers to activate the backdoor in DMs through style triggers within input images. Our technique generates trigger-embedded images that are perceptually indistinguishable from clean images.
arXiv Detail & Related papers (2025-02-28T02:08:26Z)
BackdoorDM: A Comprehensive Benchmark for Backdoor Learning in Diffusion Model [20.560992719382483]
Backdoor learning in diffusion models (DMs) is a new research hotspot. BackdoorDM is the first comprehensive benchmark designed for backdoor learning in DMs. It comprises nine state-of-the-art (SOTA) attack methods, four SOTA defense strategies, and two helpful visualization analysis tools.
arXiv Detail & Related papers (2025-02-17T13:39:05Z)
BadCM: Invisible Backdoor Attack Against Cross-Modal Learning [110.37205323355695]
We introduce a novel bilateral backdoor to fill in the missing pieces of the puzzle in the cross-modal backdoor. BadCM is the first invisible backdoor method deliberately designed for diverse cross-modal attacks within one unified framework.
arXiv Detail & Related papers (2024-10-03T03:51:53Z)
PureDiffusion: Using Backdoor to Counter Backdoor in Generative Diffusion Models [5.957580737396457]
Diffusion models (DMs) are advanced deep learning models that achieved state-of-the-art capability on a wide range of generative tasks. Recent studies have shown their vulnerability regarding backdoor attacks, in which backdoored DMs consistently generate a designated result called backdoor target. We introduce PureDiffusion, a novel backdoor defense framework that can efficiently detect backdoor attacks by inverting backdoor triggers embedded in DMs.
arXiv Detail & Related papers (2024-09-20T23:19:26Z)
Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor [0.24335447922683692]
We introduce a new type of backdoor attack that conceals itself within the underlying model architecture. The add-on modules of model architecture layers can detect the presence of input trigger tokens and modify layer weights. We conduct extensive experiments to evaluate our attack methods using two model architecture settings on five different large language datasets.
arXiv Detail & Related papers (2024-09-03T14:54:16Z)
BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning [71.60858267608306]
Medical foundation models are susceptible to backdoor attacks. This work introduces a method to embed a backdoor into the medical foundation model during the prompt learning phase. Our method, BAPLe, requires only a minimal subset of data to adjust the noise trigger and the text prompts for downstream tasks.
arXiv Detail & Related papers (2024-08-14T10:18:42Z)
Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models [3.134071086568745]
Diffusion models (DMs) are regarded as one of the most advanced generative models today. Recent studies suggest that DMs are vulnerable to backdoor attacks. This vulnerability poses substantial risks, including reputational damage to model owners. We introduce Diff-Cleanse, a novel two-stage backdoor defense framework specifically designed for DMs.
arXiv Detail & Related papers (2024-07-31T03:54:41Z)
Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift [86.92048184556936]
We propose the first backdoor detection and removal framework for DMs. We evaluate our framework Elijah on hundreds of DMs of 3 types including DDPM, NCSN and LDM. Our approach can have close to 100% detection accuracy and reduce the backdoor effects to close to zero without significantly sacrificing the model utility.
arXiv Detail & Related papers (2023-11-27T23:58:56Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks. Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence. Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z)
Kallima: A Clean-label Framework for Textual Backdoor Attacks [25.332731545200808]
We propose the first clean-label framework Kallima for synthesizing mimesis-style backdoor samples. We modify inputs belonging to the target class with adversarial perturbations, making the model rely more on the backdoor trigger.
arXiv Detail & Related papers (2022-06-03T21:44:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.