Post-Training Detection of Backdoor Attacks for Two-Class and
Multi-Attack Scenarios
- URL: http://arxiv.org/abs/2201.08474v1
- Date: Thu, 20 Jan 2022 22:21:38 GMT
- Title: Post-Training Detection of Backdoor Attacks for Two-Class and
Multi-Attack Scenarios
- Authors: Zhen Xiang, David J. Miller, George Kesidis
- Abstract summary: Backdoor attacks (BAs) are an emerging threat to deep neural network classifiers.
We propose a detection framework based on BP reverse-engineering and a novel it expected transferability (ET) statistic.
- Score: 22.22337220509128
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Backdoor attacks (BAs) are an emerging threat to deep neural network
classifiers. A victim classifier will predict to an attacker-desired target
class whenever a test sample is embedded with the same backdoor pattern (BP)
that was used to poison the classifier's training set. Detecting whether a
classifier is backdoor attacked is not easy in practice, especially when the
defender is, e.g., a downstream user without access to the classifier's
training set. This challenge is addressed here by a reverse-engineering defense
(RED), which has been shown to yield state-of-the-art performance in several
domains. However, existing REDs are not applicable when there are only {\it two
classes} or when {\it multiple attacks} are present. These scenarios are first
studied in the current paper, under the practical constraints that the defender
neither has access to the classifier's training set nor to supervision from
clean reference classifiers trained for the same domain. We propose a detection
framework based on BP reverse-engineering and a novel {\it expected
transferability} (ET) statistic. We show that our ET statistic is effective
{\it using the same detection threshold}, irrespective of the classification
domain, the attack configuration, and the BP reverse-engineering algorithm that
is used. The excellent performance of our method is demonstrated on six
benchmark datasets. Notably, our detection framework is also applicable to
multi-class scenarios with multiple attacks.
Related papers
- AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries.
We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots.
We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z) - Improved Activation Clipping for Universal Backdoor Mitigation and
Test-Time Detection [27.62279831135902]
Deep neural networks are vulnerable toTrojan attacks, where an attacker poisons the training set with backdoor triggers.
Recent work shows that backdoor poisoning induces over-fitting (abnormally large activations) in the attacked model.
We devise a new such approach, choosing the activation bounds to explicitly limit classification margins.
arXiv Detail & Related papers (2023-08-08T22:47:39Z) - UMD: Unsupervised Model Detection for X2X Backdoor Attacks [16.8197731929139]
Backdoor (Trojan) attack is a common threat to deep neural networks, where samples from one or more source classes embedded with a trigger backdoor will be misclassified to adversarial target classes.
We propose Unsupervised Model Detection method that effectively detects X2X backdoor attacks via a joint inference of the adversarial (source, target) class pairs.
arXiv Detail & Related papers (2023-05-29T23:06:05Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary
Backdoor Pattern Types Using a Maximum Margin Statistic [27.62279831135902]
We propose a post-training defense that detects backdoor attacks with arbitrary types of backdoor embeddings.
Our detector does not need any legitimate clean samples, and can efficiently detect backdoor attacks with arbitrary numbers of source classes.
arXiv Detail & Related papers (2022-05-13T21:32:24Z) - AntidoteRT: Run-time Detection and Correction of Poison Attacks on
Neural Networks [18.461079157949698]
backdoor poisoning attacks against image classification networks.
We propose lightweight automated detection and correction techniques against poisoning attacks.
Our technique outperforms existing defenses such as NeuralCleanse and STRIP on popular benchmarks.
arXiv Detail & Related papers (2022-01-31T23:42:32Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Detecting Backdoor Attacks Against Point Cloud Classifiers [34.14971037420606]
First BA against point cloud (PC) classifiers was proposed, creating new threats to many important applications including autonomous driving.
In this paper, we propose a reverse-engineering defense that infers whether a PC classifier is backdoor attacked, without access to its training set.
The effectiveness of our defense is demonstrated on the benchmark ModeNet40 dataset for PCs.
arXiv Detail & Related papers (2021-10-20T03:12:06Z) - Adversarially Robust One-class Novelty Detection [83.1570537254877]
We show that existing novelty detectors are susceptible to adversarial examples.
We propose a defense strategy that manipulates the latent space of novelty detectors to improve the robustness against adversarial examples.
arXiv Detail & Related papers (2021-08-25T10:41:29Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Detection of Adversarial Supports in Few-shot Classifiers Using Feature
Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets.
We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection.
Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.