HaS-Nets: A Heal and Select Mechanism to Defend DNNs Against Backdoor
Attacks for Data Collection Scenarios
- URL: http://arxiv.org/abs/2012.07474v1
- Date: Mon, 14 Dec 2020 12:47:41 GMT
- Title: HaS-Nets: A Heal and Select Mechanism to Defend DNNs Against Backdoor
Attacks for Data Collection Scenarios
- Authors: Hassan Ali, Surya Nepal, Salil S. Kanhere and Sanjay Jha
- Abstract summary: "Low-confidence backdoor attack" exploits confidence labels assigned to poisoned training samples.
"HaS-Nets" can decrease ASRs from over 90% to less than 15%, independent of the dataset.
- Score: 23.898803100714957
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We have witnessed the continuing arms race between backdoor attacks and the
corresponding defense strategies on Deep Neural Networks (DNNs). Most
state-of-the-art defenses rely on the statistical sanitization of the "inputs"
or "latent DNN representations" to capture trojan behaviour. In this paper, we
first challenge the robustness of such recently reported defenses by
introducing a novel variant of targeted backdoor attack, called "low-confidence
backdoor attack". We also propose a novel defense technique, called "HaS-Nets".
"Low-confidence backdoor attack" exploits the confidence labels assigned to
poisoned training samples by giving low values to hide their presence from the
defender, both during training and inference. We evaluate the attack against
four state-of-the-art defense methods, viz., STRIP, Gradient-Shaping, Februus
and ULP-defense, and achieve Attack Success Rate (ASR) of 99%, 63.73%, 91.2%
and 80%, respectively.
We next present "HaS-Nets" to resist backdoor insertion in the network during
training, using a reasonably small healing dataset, approximately 2% to 15% of
full training data, to heal the network at each iteration. We evaluate it for
different datasets - Fashion-MNIST, CIFAR-10, Consumer Complaint and Urban
Sound - and network architectures - MLPs, 2D-CNNs, 1D-CNNs. Our experiments
show that "HaS-Nets" can decrease ASRs from over 90% to less than 15%,
independent of the dataset, attack configuration and network architecture.
Related papers
- Beating Backdoor Attack at Its Own Game [10.131734154410763]
Deep neural networks (DNNs) are vulnerable to backdoor attack.
Existing defense methods have greatly reduced attack success rate.
We propose a highly effective framework which injects non-adversarial backdoors targeting poisoned samples.
arXiv Detail & Related papers (2023-07-28T13:07:42Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Backdoor Defense via Adaptively Splitting Poisoned Dataset [57.70673801469096]
Backdoor defenses have been studied to alleviate the threat of deep neural networks (DNNs) being backdoor attacked and maliciously altered.
We argue that the core of training-time defense is to select poisoned samples and to handle them properly.
Under our framework, we propose an adaptively splitting dataset-based defense (ASD)
arXiv Detail & Related papers (2023-03-23T02:16:38Z) - Trap and Replace: Defending Backdoor Attacks by Trapping Them into an
Easy-to-Replace Subnetwork [105.0735256031911]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
We propose a brand-new backdoor defense strategy, which makes it much easier to remove the harmful influence of backdoor samples.
We evaluate our method against ten different backdoor attacks.
arXiv Detail & Related papers (2022-10-12T17:24:01Z) - Imperceptible and Multi-channel Backdoor Attack against Deep Neural
Networks [9.931056642574454]
We propose a novel imperceptible and multi-channel backdoor attack against Deep Neural Networks.
Specifically, for a colored image, we utilize DCT steganography to construct the trigger on different channels of the image.
Experimental results demonstrate that the average attack success rate of the N-to-N backdoor attack is 93.95% on CIFAR-10 dataset and 91.55% on TinyImageNet dataset.
arXiv Detail & Related papers (2022-01-31T12:19:28Z) - Test-Time Detection of Backdoor Triggers for Poisoned Deep Neural
Networks [24.532269628999025]
Backdoor (Trojan) attacks are emerging threats against deep neural networks (DNN)
In this paper, we propose an "in-flight" defense against backdoor attacks on image classification.
arXiv Detail & Related papers (2021-12-06T20:52:00Z) - ONION: A Simple and Effective Defense Against Textual Backdoor Attacks [91.83014758036575]
Backdoor attacks are a kind of emergent training-time threat to deep neural networks (DNNs)
In this paper, we propose a simple and effective textual backdoor defense named ONION.
Experiments demonstrate the effectiveness of our model in defending BiLSTM and BERT against five different backdoor attacks.
arXiv Detail & Related papers (2020-11-20T12:17:21Z) - Backdoor Attacks to Graph Neural Networks [73.56867080030091]
We propose the first backdoor attack to graph neural networks (GNN)
In our backdoor attack, a GNN predicts an attacker-chosen target label for a testing graph once a predefined subgraph is injected to the testing graph.
Our empirical results show that our backdoor attacks are effective with a small impact on a GNN's prediction accuracy for clean testing graphs.
arXiv Detail & Related papers (2020-06-19T14:51:01Z) - BadNL: Backdoor Attacks against NLP Models with Semantic-preserving
Improvements [33.309299864983295]
We propose BadNL, a general NLP backdoor attack framework including novel attack methods.
Our attacks achieve an almost perfect attack success rate with a negligible effect on the original model's utility.
arXiv Detail & Related papers (2020-06-01T16:17:14Z) - Defending against Backdoor Attack on Deep Neural Networks [98.45955746226106]
We study the so-called textitbackdoor attack, which injects a backdoor trigger to a small portion of training data.
Experiments show that our method could effectively decrease the attack success rate, and also hold a high classification accuracy for clean images.
arXiv Detail & Related papers (2020-02-26T02:03:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.