From Shortcuts to Triggers: Backdoor Defense with Denoised PoE
- URL: http://arxiv.org/abs/2305.14910v3
- Date: Tue, 2 Apr 2024 23:01:17 GMT
- Title: From Shortcuts to Triggers: Backdoor Defense with Denoised PoE
- Authors: Qin Liu, Fei Wang, Chaowei Xiao, Muhao Chen,
- Abstract summary: Language models are often at risk of diverse backdoor attacks, especially data poisoning.
Existing backdoor defense methods mainly focus on backdoor attacks with explicit triggers.
We propose an end-to-end ensemble-based backdoor defense framework, DPoE, to defend various backdoor attacks.
- Score: 51.287157951953226
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models are often at risk of diverse backdoor attacks, especially data poisoning. Thus, it is important to investigate defense solutions for addressing them. Existing backdoor defense methods mainly focus on backdoor attacks with explicit triggers, leaving a universal defense against various backdoor attacks with diverse triggers largely unexplored. In this paper, we propose an end-to-end ensemble-based backdoor defense framework, DPoE (Denoised Product-of-Experts), which is inspired by the shortcut nature of backdoor attacks, to defend various backdoor attacks. DPoE consists of two models: a shallow model that captures the backdoor shortcuts and a main model that is prevented from learning the backdoor shortcuts. To address the label flip caused by backdoor attackers, DPoE incorporates a denoising design. Experiments on SST-2 dataset show that DPoE significantly improves the defense performance against various types of backdoor triggers including word-level, sentence-level, and syntactic triggers. Furthermore, DPoE is also effective under a more challenging but practical setting that mixes multiple types of trigger.
Related papers
- A4O: All Trigger for One sample [10.78460062665304]
We show that proposed backdoor defenders often rely on the assumption that triggers would appear in a unified way.
In this paper, we show that this naive assumption can create a loophole, allowing more sophisticated backdoor attacks to bypass.
We design a novel backdoor attack mechanism that incorporates multiple types of backdoor triggers, focusing on stealthiness and effectiveness.
arXiv Detail & Related papers (2025-01-13T10:38:58Z) - Act in Collusion: A Persistent Distributed Multi-Target Backdoor in Federated Learning [5.91728247370845]
Federated learning is vulnerable to backdoor attacks due to its distributed nature.
We propose a more practical threat model for federated learning: the distributed multi-target backdoor.
We show that 30 rounds after the attack, Attack Success rates of three different backdoors from various clients remain above 93%.
arXiv Detail & Related papers (2024-11-06T13:57:53Z) - Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models [68.40324627475499]
We introduce a novel two-step defense framework named Expose Before You Defend.
EBYD unifies existing backdoor defense methods into a comprehensive defense system with enhanced performance.
We conduct extensive experiments on 10 image attacks and 6 text attacks across 2 vision datasets and 4 language datasets.
arXiv Detail & Related papers (2024-10-25T09:36:04Z) - PureDiffusion: Using Backdoor to Counter Backdoor in Generative Diffusion Models [5.957580737396457]
Diffusion models (DMs) are advanced deep learning models that achieved state-of-the-art capability on a wide range of generative tasks.
Recent studies have shown their vulnerability regarding backdoor attacks, in which backdoored DMs consistently generate a designated result called backdoor target.
We introduce PureDiffusion, a novel backdoor defense framework that can efficiently detect backdoor attacks by inverting backdoor triggers embedded in DMs.
arXiv Detail & Related papers (2024-09-20T23:19:26Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - Dual Model Replacement:invisible Multi-target Backdoor Attack based on Federal Learning [21.600003684064706]
This paper designs a backdoor attack method based on federated learning.
aiming at the concealment of the backdoor trigger, a TrojanGan steganography model with encoder-decoder structure is designed.
A dual model replacement backdoor attack algorithm based on federated learning is designed.
arXiv Detail & Related papers (2024-04-22T07:44:02Z) - LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning [49.174341192722615]
Backdoor attack poses a significant security threat to Deep Learning applications.
Recent papers have introduced attacks using sample-specific invisible triggers crafted through special transformation functions.
We introduce a novel backdoor attack LOTUS to address both evasiveness and resilience.
arXiv Detail & Related papers (2024-03-25T21:01:29Z) - Shortcuts Everywhere and Nowhere: Exploring Multi-Trigger Backdoor Attacks [64.68741192761726]
Backdoor attacks have become a significant threat to the pre-training and deployment of deep neural networks (DNNs)
In this study, we explore the concept of Multi-Trigger Backdoor Attacks (MTBAs), where multiple adversaries leverage different types of triggers to poison the same dataset.
arXiv Detail & Related papers (2024-01-27T04:49:37Z) - BATT: Backdoor Attack with Transformation-based Triggers [72.61840273364311]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
Backdoor adversaries inject hidden backdoors that can be activated by adversary-specified trigger patterns.
One recent research revealed that most of the existing attacks failed in the real physical world.
arXiv Detail & Related papers (2022-11-02T16:03:43Z) - Dual-Key Multimodal Backdoors for Visual Question Answering [26.988750557552983]
We show that multimodal networks are vulnerable to a novel type of attack that we refer to as Dual-Key Multimodal Backdoors.
This attack exploits the complex fusion mechanisms used by state-of-the-art networks to embed backdoors that are both effective and stealthy.
We present an extensive study of multimodal backdoors on the Visual Question Answering (VQA) task with multiple architectures and visual feature backbones.
arXiv Detail & Related papers (2021-12-14T18:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.