Backdoor Defense via Decoupling the Training Process
- URL: http://arxiv.org/abs/2202.03423v1
- Date: Sat, 5 Feb 2022 03:34:01 GMT
- Title: Backdoor Defense via Decoupling the Training Process
- Authors: Kunzhe Huang, Yiming Li, Baoyuan Wu, Zhan Qin, Kui Ren
- Abstract summary: Deep neural networks (DNNs) are vulnerable to backdoor attacks.
We propose a novel backdoor defense via decoupling the original end-to-end training process into three stages.
- Score: 46.34744086706348
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies have revealed that deep neural networks (DNNs) are vulnerable
to backdoor attacks, where attackers embed hidden backdoors in the DNN model by
poisoning a few training samples. The attacked model behaves normally on benign
samples, whereas its prediction will be maliciously changed when the backdoor
is activated. We reveal that poisoned samples tend to cluster together in the
feature space of the attacked DNN model, which is mostly due to the end-to-end
supervised training paradigm. Inspired by this observation, we propose a novel
backdoor defense via decoupling the original end-to-end training process into
three stages. Specifically, we first learn the backbone of a DNN model via
\emph{self-supervised learning} based on training samples without their labels.
The learned backbone will map samples with the same ground-truth label to
similar locations in the feature space. Then, we freeze the parameters of the
learned backbone and train the remaining fully connected layers via standard
training with all (labeled) training samples. Lastly, to further alleviate
side-effects of poisoned samples in the second stage, we remove labels of some
`low-credible' samples determined based on the learned model and conduct a
\emph{semi-supervised fine-tuning} of the whole model. Extensive experiments on
multiple benchmark datasets and DNN models verify that the proposed defense is
effective in reducing backdoor threats while preserving high accuracy in
predicting benign samples. Our code is available at
\url{https://github.com/SCLBD/DBD}.
Related papers
- The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data [4.9676716806872125]
backdoor attacks have posed a serious security threat to the training process of deep neural networks (DNNs)
We propose a novel dual-network training framework: The Victim and The Beneficiary (V&B), which exploits a poisoned model to train a clean model without extra benign samples.
Our framework is effective in preventing backdoor injection and robust to various attacks while maintaining the performance on benign samples.
arXiv Detail & Related papers (2024-04-17T11:15:58Z) - Setting the Trap: Capturing and Defeating Backdoors in Pretrained
Language Models through Honeypots [68.84056762301329]
Recent research has exposed the susceptibility of pretrained language models (PLMs) to backdoor attacks.
We propose and integrate a honeypot module into the original PLM to absorb backdoor information exclusively.
Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features.
arXiv Detail & Related papers (2023-10-28T08:21:16Z) - Reconstructive Neuron Pruning for Backdoor Defense [96.21882565556072]
We propose a novel defense called emphReconstructive Neuron Pruning (RNP) to expose and prune backdoor neurons.
In RNP, unlearning is operated at the neuron level while recovering is operated at the filter level, forming an asymmetric reconstructive learning procedure.
We show that such an asymmetric process on only a few clean samples can effectively expose and prune the backdoor neurons implanted by a wide range of attacks.
arXiv Detail & Related papers (2023-05-24T08:29:30Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - Defending Against Backdoor Attacks by Layer-wise Feature Analysis [11.465401472704732]
Training deep neural networks (DNNs) usually requires massive training data and computational resources.
A new training-time attack (i.e., backdoor attack) aims to induce misclassification of input samples containing adversary-specified trigger patterns.
We propose a simple yet effective method to filter poisoned samples by analyzing the feature differences between suspicious and benign samples at the critical layer.
arXiv Detail & Related papers (2023-02-24T17:16:37Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Training set cleansing of backdoor poisoning by self-supervised
representation learning [0.0]
A backdoor or Trojan attack is an important type of data poisoning attack against deep neural network (DNN)
We show that supervised training may build stronger association between the backdoor pattern and the associated target class than that between normal features and the true class of origin.
We propose to use unsupervised representation learning to avoid emphasising backdoor-poisoned training samples and learn a similar feature embedding for samples of the same class.
arXiv Detail & Related papers (2022-10-19T03:29:58Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.