Backdoor Mitigation in Deep Neural Networks via Strategic Retraining
- URL: http://arxiv.org/abs/2212.07278v1
- Date: Wed, 14 Dec 2022 15:22:32 GMT
- Title: Backdoor Mitigation in Deep Neural Networks via Strategic Retraining
- Authors: Akshay Dhonthi, Ernst Moritz Hahn, Vahid Hashemi
- Abstract summary: Deep Neural Networks (DNN) are becoming increasingly important in assisted and automated driving.
One particular problem is that they are prone to hidden backdoors.
In this paper, we introduce a novel method to remove backdoors.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep Neural Networks (DNN) are becoming increasingly more important in
assisted and automated driving. Using such entities which are obtained using
machine learning is inevitable: tasks such as recognizing traffic signs cannot
be developed reasonably using traditional software development methods. DNN
however do have the problem that they are mostly black boxes and therefore hard
to understand and debug. One particular problem is that they are prone to
hidden backdoors. This means that the DNN misclassifies its input, because it
considers properties that should not be decisive for the output. Backdoors may
either be introduced by malicious attackers or by inappropriate training. In
any case, detecting and removing them is important in the automotive area, as
they might lead to safety violations with potentially severe consequences. In
this paper, we introduce a novel method to remove backdoors. Our method works
for both intentional as well as unintentional backdoors. We also do not require
prior knowledge about the shape or distribution of backdoors. Experimental
evidence shows that our method performs well on several medium-sized examples.
Related papers
- BeniFul: Backdoor Defense via Middle Feature Analysis for Deep Neural Networks [0.6872939325656702]
We propose an effective and comprehensive backdoor defense method named BeniFul, which consists of two parts: a gray-box backdoor input detection and a white-box backdoor elimination.
Experimental results on CIFAR-10 and Tiny ImageNet against five state-of-the-art attacks demonstrate that our BeniFul exhibits a great defense capability in backdoor input detection and backdoor elimination.
arXiv Detail & Related papers (2024-10-15T13:14:55Z) - Flatness-aware Sequential Learning Generates Resilient Backdoors [7.969181278996343]
Recently, backdoor attacks have become an emerging threat to the security of machine learning models.
This paper counters CF of backdoors by leveraging continual learning (CL) techniques.
We propose a novel framework, named Sequential Backdoor Learning (SBL), that can generate resilient backdoors.
arXiv Detail & Related papers (2024-07-20T03:30:05Z) - AGNES: Abstraction-guided Framework for Deep Neural Networks Security [0.6827423171182154]
Deep Neural Networks (DNNs) are becoming widespread, particularly in safety-critical areas.
One application is image recognition in autonomous driving.
DNNs are prone to backdoors, meaning that they concentrate on attributes of the image that should be irrelevant for their correct classification.
We introduce AGNES, a tool to detect backdoors in DNNs for image recognition.
arXiv Detail & Related papers (2023-11-07T14:05:20Z) - BackdoorBox: A Python Toolbox for Backdoor Learning [67.53987387581222]
This Python toolbox implements representative and advanced backdoor attacks and defenses.
It allows researchers and developers to easily implement and compare different methods on benchmark or their local datasets.
arXiv Detail & Related papers (2023-02-01T09:45:42Z) - Backdoor Cleansing with Unlabeled Data [70.29989887008209]
externally trained Deep Neural Networks (DNNs) can potentially be backdoor attacked.
We propose a novel defense method that does not require training labels.
Our method, trained without labels, is on-par with state-of-the-art defense methods trained using labels.
arXiv Detail & Related papers (2022-11-22T06:29:30Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Noise-Response Analysis of Deep Neural Networks Quantifies Robustness
and Fingerprints Structural Malware [48.7072217216104]
Deep neural networks (DNNs) have structural malware' (i.e., compromised weights and activation pathways)
It is generally difficult to detect backdoors, and existing detection methods are computationally expensive and require extensive resources (e.g., access to the training data)
Here, we propose a rapid feature-generation technique that quantifies the robustness of a DNN, fingerprints' its nonlinearity, and allows us to detect backdoors (if present)
Our empirical results demonstrate that we can accurately detect backdoors with high confidence orders-of-magnitude faster than existing approaches (seconds versus
arXiv Detail & Related papers (2020-07-31T23:52:58Z) - Backdoors in Neural Models of Source Code [13.960152426268769]
We study backdoors in the context of deep-learning for source code.
We show how to poison a dataset to install such backdoors.
We also show the ease of injecting backdoors and our ability to eliminate them.
arXiv Detail & Related papers (2020-06-11T21:35:24Z) - Defending against Backdoor Attack on Deep Neural Networks [98.45955746226106]
We study the so-called textitbackdoor attack, which injects a backdoor trigger to a small portion of training data.
Experiments show that our method could effectively decrease the attack success rate, and also hold a high classification accuracy for clean images.
arXiv Detail & Related papers (2020-02-26T02:03:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.