Trap and Replace: Defending Backdoor Attacks by Trapping Them into an
Easy-to-Replace Subnetwork
- URL: http://arxiv.org/abs/2210.06428v1
- Date: Wed, 12 Oct 2022 17:24:01 GMT
- Title: Trap and Replace: Defending Backdoor Attacks by Trapping Them into an
Easy-to-Replace Subnetwork
- Authors: Haotao Wang, Junyuan Hong, Aston Zhang, Jiayu Zhou, Zhangyang Wang
- Abstract summary: Deep neural networks (DNNs) are vulnerable to backdoor attacks.
We propose a brand-new backdoor defense strategy, which makes it much easier to remove the harmful influence of backdoor samples.
We evaluate our method against ten different backdoor attacks.
- Score: 105.0735256031911
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) are vulnerable to backdoor attacks. Previous
works have shown it extremely challenging to unlearn the undesired backdoor
behavior from the network, since the entire network can be affected by the
backdoor samples. In this paper, we propose a brand-new backdoor defense
strategy, which makes it much easier to remove the harmful influence of
backdoor samples from the model. Our defense strategy, \emph{Trap and Replace},
consists of two stages. In the first stage, we bait and trap the backdoors in a
small and easy-to-replace subnetwork. Specifically, we add an auxiliary image
reconstruction head on top of the stem network shared with a light-weighted
classification head. The intuition is that the auxiliary image reconstruction
task encourages the stem network to keep sufficient low-level visual features
that are hard to learn but semantically correct, instead of overfitting to the
easy-to-learn but semantically incorrect backdoor correlations. As a result,
when trained on backdoored datasets, the backdoors are easily baited towards
the unprotected classification head, since it is much more vulnerable than the
shared stem, leaving the stem network hardly poisoned. In the second stage, we
replace the poisoned light-weighted classification head with an untainted one,
by re-training it from scratch only on a small holdout dataset with clean
samples, while fixing the stem network. As a result, both the stem and the
classification head in the final network are hardly affected by backdoor
training samples. We evaluate our method against ten different backdoor
attacks. Our method outperforms previous state-of-the-art methods by up to
$20.57\%$, $9.80\%$, and $13.72\%$ attack success rate and on-average $3.14\%$,
$1.80\%$, and $1.21\%$ clean classification accuracy on CIFAR10, GTSRB, and
ImageNet-12, respectively. Code is available online.
Related papers
- Flatness-aware Sequential Learning Generates Resilient Backdoors [7.969181278996343]
Recently, backdoor attacks have become an emerging threat to the security of machine learning models.
This paper counters CF of backdoors by leveraging continual learning (CL) techniques.
We propose a novel framework, named Sequential Backdoor Learning (SBL), that can generate resilient backdoors.
arXiv Detail & Related papers (2024-07-20T03:30:05Z) - Beating Backdoor Attack at Its Own Game [10.131734154410763]
Deep neural networks (DNNs) are vulnerable to backdoor attack.
Existing defense methods have greatly reduced attack success rate.
We propose a highly effective framework which injects non-adversarial backdoors targeting poisoned samples.
arXiv Detail & Related papers (2023-07-28T13:07:42Z) - Single Image Backdoor Inversion via Robust Smoothed Classifiers [76.66635991456336]
We present a new approach for backdoor inversion, which is able to recover the hidden backdoor with as few as a single image.
In this work, we present a new approach for backdoor inversion, which is able to recover the hidden backdoor with as few as a single image.
arXiv Detail & Related papers (2023-03-01T03:37:42Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Handcrafted Backdoors in Deep Neural Networks [33.21980707457639]
We introduce a handcrafted attack that directly manipulates the parameters of a pre-trained model to inject backdoors.
Our backdoors remain effective across four datasets and four network architectures with a success rate above 96%.
Our results suggest that further research is needed for understanding the complete space of supply-chain backdoor attacks.
arXiv Detail & Related papers (2021-06-08T20:58:23Z) - Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural
Networks for Detection and Training Set Cleansing [22.22337220509128]
Backdoor data poisoning is an emerging form of adversarial attack against deep neural network image classifiers.
In this paper, we make a break-through in defending backdoor attacks with imperceptible backdoor patterns.
We propose an optimization-based reverse-engineering defense, that jointly: 1) detects whether the training set is poisoned; 2) if so, identifies the target class and the training images with the backdoor pattern embedded; and 3) additionally, reversely engineers an estimate of the backdoor pattern used by the attacker.
arXiv Detail & Related papers (2020-10-15T03:12:24Z) - Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
Backdoor learning is an emerging and rapidly growing research area.
This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z) - Clean-Label Backdoor Attacks on Video Recognition Models [87.46539956587908]
We show that image backdoor attacks are far less effective on videos.
We propose the use of a universal adversarial trigger as the backdoor trigger to attack video recognition models.
Our proposed backdoor attack is resistant to state-of-the-art backdoor defense/detection methods.
arXiv Detail & Related papers (2020-03-06T04:51:48Z) - Defending against Backdoor Attack on Deep Neural Networks [98.45955746226106]
We study the so-called textitbackdoor attack, which injects a backdoor trigger to a small portion of training data.
Experiments show that our method could effectively decrease the attack success rate, and also hold a high classification accuracy for clean images.
arXiv Detail & Related papers (2020-02-26T02:03:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.