Anti-Backdoor Learning: Training Clean Models on Poisoned Data
- URL: http://arxiv.org/abs/2110.11571v2
- Date: Mon, 25 Oct 2021 03:41:22 GMT
- Title: Anti-Backdoor Learning: Training Clean Models on Poisoned Data
- Authors: Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, Xingjun Ma
- Abstract summary: Backdoor attack has emerged as a major security threat to deep neural networks (DNNs)
We introduce the concept of emphanti-backdoor learning, aiming to train emphclean models given backdoor-poisoned data.
We empirically show that ABL-trained models on backdoor-poisoned data achieve the same performance as they were trained on purely clean data.
- Score: 17.648453598314795
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Backdoor attack has emerged as a major security threat to deep neural
networks (DNNs). While existing defense methods have demonstrated promising
results on detecting or erasing backdoors, it is still not clear whether robust
training methods can be devised to prevent the backdoor triggers being injected
into the trained model in the first place. In this paper, we introduce the
concept of \emph{anti-backdoor learning}, aiming to train \emph{clean} models
given backdoor-poisoned data. We frame the overall learning process as a
dual-task of learning the \emph{clean} and the \emph{backdoor} portions of
data. From this view, we identify two inherent characteristics of backdoor
attacks as their weaknesses: 1) the models learn backdoored data much faster
than learning with clean data, and the stronger the attack the faster the model
converges on backdoored data; 2) the backdoor task is tied to a specific class
(the backdoor target class). Based on these two weaknesses, we propose a
general learning scheme, Anti-Backdoor Learning (ABL), to automatically prevent
backdoor attacks during training. ABL introduces a two-stage \emph{gradient
ascent} mechanism for standard training to 1) help isolate backdoor examples at
an early training stage, and 2) break the correlation between backdoor examples
and the target class at a later training stage. Through extensive experiments
on multiple benchmark datasets against 10 state-of-the-art attacks, we
empirically show that ABL-trained models on backdoor-poisoned data achieve the
same performance as they were trained on purely clean data. Code is available
at \url{https://github.com/bboylyg/ABL}.
Related papers
- Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models [68.40324627475499]
We introduce a novel two-step defense framework named Expose Before You Defend.
EBYD unifies existing backdoor defense methods into a comprehensive defense system with enhanced performance.
We conduct extensive experiments on 10 image attacks and 6 text attacks across 2 vision datasets and 4 language datasets.
arXiv Detail & Related papers (2024-10-25T09:36:04Z) - Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor [0.24335447922683692]
We introduce a new type of backdoor attack that conceals itself within the underlying model architecture.
The add-on modules of model architecture layers can detect the presence of input trigger tokens and modify layer weights.
We conduct extensive experiments to evaluate our attack methods using two model architecture settings on five different large language datasets.
arXiv Detail & Related papers (2024-09-03T14:54:16Z) - Flatness-aware Sequential Learning Generates Resilient Backdoors [7.969181278996343]
Recently, backdoor attacks have become an emerging threat to the security of machine learning models.
This paper counters CF of backdoors by leveraging continual learning (CL) techniques.
We propose a novel framework, named Sequential Backdoor Learning (SBL), that can generate resilient backdoors.
arXiv Detail & Related papers (2024-07-20T03:30:05Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Can Backdoor Attacks Survive Time-Varying Models? [35.836598031681426]
Backdoors are powerful attacks against deep neural networks (DNNs)
We study the impact of backdoor attacks on a more realistic scenario of time-varying DNN models.
Our results show that one-shot backdoor attacks do not survive past a few model updates.
arXiv Detail & Related papers (2022-06-08T01:32:49Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
Backdoor learning is an emerging and rapidly growing research area.
This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z) - Blind Backdoors in Deep Learning Models [22.844973592524966]
We investigate a new method for injecting backdoors into machine learning models, based on compromising the loss-value computation in the model-training code.
We use it to demonstrate new classes of backdoors strictly more powerful than those in the prior literature.
Our attack is blind: the attacker cannot modify the training data, nor observe the execution of his code, nor access the resulting model.
arXiv Detail & Related papers (2020-05-08T02:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.