BAN: Detecting Backdoors Activated by Adversarial Neuron Noise
- URL: http://arxiv.org/abs/2405.19928v1
- Date: Thu, 30 May 2024 10:44:45 GMT
- Title: BAN: Detecting Backdoors Activated by Adversarial Neuron Noise
- Authors: Xiaoyun Xu, Zhuoran Liu, Stefanos Koffas, Shujian Yu, Stjepan Picek,
- Abstract summary: Backdoor attacks on deep learning represent a recent threat that has gained significant attention in the research community.
Backdoor defenses are mainly based on backdoor inversion, which has been shown to be generic, model-agnostic, and applicable to practical threat scenarios.
This paper improves backdoor feature inversion for backdoor detection by incorporating extra neuron activation information.
- Score: 30.243702765232083
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Backdoor attacks on deep learning represent a recent threat that has gained significant attention in the research community. Backdoor defenses are mainly based on backdoor inversion, which has been shown to be generic, model-agnostic, and applicable to practical threat scenarios. State-of-the-art backdoor inversion recovers a mask in the feature space to locate prominent backdoor features, where benign and backdoor features can be disentangled. However, it suffers from high computational overhead, and we also find that it overly relies on prominent backdoor features that are highly distinguishable from benign features. To tackle these shortcomings, this paper improves backdoor feature inversion for backdoor detection by incorporating extra neuron activation information. In particular, we adversarially increase the loss of backdoored models with respect to weights to activate the backdoor effect, based on which we can easily differentiate backdoored and clean models. Experimental results demonstrate our defense, BAN, is 1.37$\times$ (on CIFAR-10) and 5.11$\times$ (on ImageNet200) more efficient with 9.99% higher detect success rate than the state-of-the-art defense BTI-DBF. Our code and trained models are publicly available.\url{https://anonymous.4open.science/r/ban-4B32}
Related papers
- Flatness-aware Sequential Learning Generates Resilient Backdoors [7.969181278996343]
Recently, backdoor attacks have become an emerging threat to the security of machine learning models.
This paper counters CF of backdoors by leveraging continual learning (CL) techniques.
We propose a novel framework, named Sequential Backdoor Learning (SBL), that can generate resilient backdoors.
arXiv Detail & Related papers (2024-07-20T03:30:05Z) - Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack [32.74007523929888]
We re-investigate the characteristics of backdoored models after defense.
We find that the original backdoors still exist in defense models derived from existing post-training defense strategies.
We empirically show that these dormant backdoors can be easily re-activated during inference.
arXiv Detail & Related papers (2024-05-25T08:57:30Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - BELT: Old-School Backdoor Attacks can Evade the State-of-the-Art Defense with Backdoor Exclusivity Lifting [21.91491621538245]
We propose and investigate a new characteristic of backdoor attacks, namely, backdoor exclusivity.
Backdoor exclusivity measures the ability of backdoor triggers to remain effective in the presence of input variation.
Our approach substantially enhances the stealthiness of four old-school backdoor attacks, at almost no cost of the attack success rate and normal utility.
arXiv Detail & Related papers (2023-12-08T08:35:16Z) - BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input
Detection [42.021282816470794]
We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs)
Our defense falls within the category of post-development defenses that operate independently of how the model was generated.
We show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference.
arXiv Detail & Related papers (2023-08-23T21:47:06Z) - BATT: Backdoor Attack with Transformation-based Triggers [72.61840273364311]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
Backdoor adversaries inject hidden backdoors that can be activated by adversary-specified trigger patterns.
One recent research revealed that most of the existing attacks failed in the real physical world.
arXiv Detail & Related papers (2022-11-02T16:03:43Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word
Substitution [57.51117978504175]
Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks.
Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated.
We present invisible backdoors that are activated by a learnable combination of word substitution.
arXiv Detail & Related papers (2021-06-11T13:03:17Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Defending against Backdoor Attack on Deep Neural Networks [98.45955746226106]
We study the so-called textitbackdoor attack, which injects a backdoor trigger to a small portion of training data.
Experiments show that our method could effectively decrease the attack success rate, and also hold a high classification accuracy for clean images.
arXiv Detail & Related papers (2020-02-26T02:03:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.