Defending Against Backdoor Attacks by Layer-wise Feature Analysis
- URL: http://arxiv.org/abs/2302.12758v1
- Date: Fri, 24 Feb 2023 17:16:37 GMT
- Title: Defending Against Backdoor Attacks by Layer-wise Feature Analysis
- Authors: Najeeb Moharram Jebreel, Josep Domingo-Ferrer, Yiming Li
- Abstract summary: Training deep neural networks (DNNs) usually requires massive training data and computational resources.
A new training-time attack (i.e., backdoor attack) aims to induce misclassification of input samples containing adversary-specified trigger patterns.
We propose a simple yet effective method to filter poisoned samples by analyzing the feature differences between suspicious and benign samples at the critical layer.
- Score: 11.465401472704732
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training deep neural networks (DNNs) usually requires massive training data
and computational resources. Users who cannot afford this may prefer to
outsource training to a third party or resort to publicly available pre-trained
models. Unfortunately, doing so facilitates a new training-time attack (i.e.,
backdoor attack) against DNNs. This attack aims to induce misclassification of
input samples containing adversary-specified trigger patterns. In this paper,
we first conduct a layer-wise feature analysis of poisoned and benign samples
from the target class. We find out that the feature difference between benign
and poisoned samples tends to be maximum at a critical layer, which is not
always the one typically used in existing defenses, namely the layer before
fully-connected layers. We also demonstrate how to locate this critical layer
based on the behaviors of benign samples. We then propose a simple yet
effective method to filter poisoned samples by analyzing the feature
differences between suspicious and benign samples at the critical layer. We
conduct extensive experiments on two benchmark datasets, which confirm the
effectiveness of our defense.
Related papers
- The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data [4.9676716806872125]
backdoor attacks have posed a serious security threat to the training process of deep neural networks (DNNs)
We propose a novel dual-network training framework: The Victim and The Beneficiary (V&B), which exploits a poisoned model to train a clean model without extra benign samples.
Our framework is effective in preventing backdoor injection and robust to various attacks while maintaining the performance on benign samples.
arXiv Detail & Related papers (2024-04-17T11:15:58Z) - Confidence-driven Sampling for Backdoor Attacks [49.72680157684523]
Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios.
Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples.
We introduce a straightforward yet highly effective sampling methodology that leverages confidence scores. Specifically, it selects samples with lower confidence scores, significantly increasing the challenge for defenders in identifying and countering these attacks.
arXiv Detail & Related papers (2023-10-08T18:57:36Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Boosting Adversarial Transferability via Fusing Logits of Top-1
Decomposed Feature [36.78292952798531]
We propose a Singular Value Decomposition (SVD)-based feature-level attack method.
Our approach is inspired by the discovery that eigenvectors associated with the larger singular values from the middle layer features exhibit superior generalization and attention properties.
arXiv Detail & Related papers (2023-05-02T12:27:44Z) - Don't FREAK Out: A Frequency-Inspired Approach to Detecting Backdoor
Poisoned Samples in DNNs [130.965542948104]
In this paper, we investigate the frequency sensitivity of Deep Neural Networks (DNNs) when presented with clean samples versus poisoned samples.
We propose a frequency-based poisoned sample detection algorithm that is simple yet effective.
arXiv Detail & Related papers (2023-03-23T12:11:24Z) - COLLIDER: A Robust Training Framework for Backdoor Data [11.510009152620666]
Deep neural network (DNN) classifiers are vulnerable to backdoor attacks.
An adversary poisons some of the training data in such attacks by installing a trigger.
Various approaches have recently been proposed to detect malicious backdoored DNNs.
arXiv Detail & Related papers (2022-10-13T03:48:46Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - Backdoor Defense via Decoupling the Training Process [46.34744086706348]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
We propose a novel backdoor defense via decoupling the original end-to-end training process into three stages.
arXiv Detail & Related papers (2022-02-05T03:34:01Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Anomaly Detection-Based Unknown Face Presentation Attack Detection [74.4918294453537]
Anomaly detection-based spoof attack detection is a recent development in face Presentation Attack Detection.
In this paper, we present a deep-learning solution for anomaly detection-based spoof attack detection.
The proposed approach benefits from the representation learning power of the CNNs and learns better features for fPAD task.
arXiv Detail & Related papers (2020-07-11T21:20:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.