BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input
Detection
- URL: http://arxiv.org/abs/2308.12439v2
- Date: Thu, 5 Oct 2023 04:08:47 GMT
- Title: BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input
Detection
- Authors: Tinghao Xie, Xiangyu Qi, Ping He, Yiming Li, Jiachen T. Wang, Prateek
Mittal
- Abstract summary: We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs)
Our defense falls within the category of post-development defenses that operate independently of how the model was generated.
We show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference.
- Score: 42.021282816470794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel defense, against backdoor attacks on Deep Neural Networks
(DNNs), wherein adversaries covertly implant malicious behaviors (backdoors)
into DNNs. Our defense falls within the category of post-development defenses
that operate independently of how the model was generated. The proposed defense
is built upon a novel reverse engineering approach that can directly extract
backdoor functionality of a given backdoored model to a backdoor expert model.
The approach is straightforward -- finetuning the backdoored model over a small
set of intentionally mislabeled clean samples, such that it unlearns the normal
functionality while still preserving the backdoor functionality, and thus
resulting in a model (dubbed a backdoor expert model) that can only recognize
backdoor inputs. Based on the extracted backdoor expert model, we show the
feasibility of devising highly accurate backdoor input detectors that filter
out the backdoor inputs during model inference. Further augmented by an
ensemble strategy with a finetuned auxiliary model, our defense, BaDExpert
(Backdoor Input Detection with Backdoor Expert), effectively mitigates 17 SOTA
backdoor attacks while minimally impacting clean utility. The effectiveness of
BaDExpert has been verified on multiple datasets (CIFAR10, GTSRB and ImageNet)
across various model architectures (ResNet, VGG, MobileNetV2 and Vision
Transformer).
Related papers
- Flatness-aware Sequential Learning Generates Resilient Backdoors [7.969181278996343]
Recently, backdoor attacks have become an emerging threat to the security of machine learning models.
This paper counters CF of backdoors by leveraging continual learning (CL) techniques.
We propose a novel framework, named Sequential Backdoor Learning (SBL), that can generate resilient backdoors.
arXiv Detail & Related papers (2024-07-20T03:30:05Z) - BAN: Detecting Backdoors Activated by Adversarial Neuron Noise [30.243702765232083]
Backdoor attacks on deep learning represent a recent threat that has gained significant attention in the research community.
Backdoor defenses are mainly based on backdoor inversion, which has been shown to be generic, model-agnostic, and applicable to practical threat scenarios.
This paper improves backdoor feature inversion for backdoor detection by incorporating extra neuron activation information.
arXiv Detail & Related papers (2024-05-30T10:44:45Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - Universal Soldier: Using Universal Adversarial Perturbations for
Detecting Backdoor Attacks [15.917794562400449]
A deep learning model may be poisoned by training with backdoored data or by modifying inner network parameters.
It is difficult to distinguish between clean and backdoored models without prior knowledge of the trigger.
We propose a novel method called Universal Soldier for Backdoor detection (USB) and reverse engineering potential backdoor triggers via UAPs.
arXiv Detail & Related papers (2023-02-01T20:47:58Z) - BackdoorBox: A Python Toolbox for Backdoor Learning [67.53987387581222]
This Python toolbox implements representative and advanced backdoor attacks and defenses.
It allows researchers and developers to easily implement and compare different methods on benchmark or their local datasets.
arXiv Detail & Related papers (2023-02-01T09:45:42Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word
Substitution [57.51117978504175]
Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks.
Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated.
We present invisible backdoors that are activated by a learnable combination of word substitution.
arXiv Detail & Related papers (2021-06-11T13:03:17Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.