MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary
Backdoor Pattern Types Using a Maximum Margin Statistic
- URL: http://arxiv.org/abs/2205.06900v2
- Date: Sun, 6 Aug 2023 16:48:12 GMT
- Title: MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary
Backdoor Pattern Types Using a Maximum Margin Statistic
- Authors: Hang Wang, Zhen Xiang, David J. Miller, George Kesidis
- Abstract summary: We propose a post-training defense that detects backdoor attacks with arbitrary types of backdoor embeddings.
Our detector does not need any legitimate clean samples, and can efficiently detect backdoor attacks with arbitrary numbers of source classes.
- Score: 27.62279831135902
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Backdoor attacks are an important type of adversarial threat against deep
neural network classifiers, wherein test samples from one or more source
classes will be (mis)classified to the attacker's target class when a backdoor
pattern is embedded. In this paper, we focus on the post-training backdoor
defense scenario commonly considered in the literature, where the defender aims
to detect whether a trained classifier was backdoor-attacked without any access
to the training set. Many post-training detectors are designed to detect
attacks that use either one or a few specific backdoor embedding functions
(e.g., patch-replacement or additive attacks). These detectors may fail when
the backdoor embedding function used by the attacker (unknown to the defender)
is different from the backdoor embedding function assumed by the defender. In
contrast, we propose a post-training defense that detects backdoor attacks with
arbitrary types of backdoor embeddings, without making any assumptions about
the backdoor embedding type. Our detector leverages the influence of the
backdoor attack, independent of the backdoor embedding mechanism, on the
landscape of the classifier's outputs prior to the softmax layer. For each
class, a maximum margin statistic is estimated. Detection inference is then
performed by applying an unsupervised anomaly detector to these statistics.
Thus, our detector does not need any legitimate clean samples, and can
efficiently detect backdoor attacks with arbitrary numbers of source classes.
These advantages over several state-of-the-art methods are demonstrated on four
datasets, for three different types of backdoor patterns, and for a variety of
attack configurations. Finally, we propose a novel, general approach for
backdoor mitigation once a detection is made. The mitigation approach was the
runner-up at the first IEEE Trojan Removal Competition. The code is online
available.
Related papers
- Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - BEAGLE: Forensics of Deep Learning Backdoor Attack for Better Defense [26.314275611787984]
Attack forensics is a critical counter-measure for traditional cyber attacks.
Deep Learning backdoor attacks have a threat model similar to traditional cyber attacks.
We propose a novel model backdoor forensics technique.
arXiv Detail & Related papers (2023-01-16T02:59:40Z) - BATT: Backdoor Attack with Transformation-based Triggers [72.61840273364311]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
Backdoor adversaries inject hidden backdoors that can be activated by adversary-specified trigger patterns.
One recent research revealed that most of the existing attacks failed in the real physical world.
arXiv Detail & Related papers (2022-11-02T16:03:43Z) - Detecting Backdoors in Deep Text Classifiers [43.36440869257781]
We present the first robust defence mechanism that generalizes to several backdoor attacks against text classification models.
Our technique is highly accurate at defending against state-of-the-art backdoor attacks, including data poisoning and weight poisoning.
arXiv Detail & Related papers (2022-10-11T07:48:03Z) - Contributor-Aware Defenses Against Adversarial Backdoor Attacks [2.830541450812474]
adversarial backdoor attacks have demonstrated the capability to perform targeted misclassification of specific examples.
We propose a contributor-aware universal defensive framework for learning in the presence of multiple, potentially adversarial data sources.
Our empirical studies demonstrate the robustness of the proposed framework against adversarial backdoor attacks from multiple simultaneous adversaries.
arXiv Detail & Related papers (2022-05-28T20:25:34Z) - Test-Time Detection of Backdoor Triggers for Poisoned Deep Neural
Networks [24.532269628999025]
Backdoor (Trojan) attacks are emerging threats against deep neural networks (DNN)
In this paper, we propose an "in-flight" defense against backdoor attacks on image classification.
arXiv Detail & Related papers (2021-12-06T20:52:00Z) - Check Your Other Door! Establishing Backdoor Attacks in the Frequency
Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks.
We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.