Related papers: Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios

Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios

URL: http://arxiv.org/abs/2201.08474v1
Date: Thu, 20 Jan 2022 22:21:38 GMT
Title: Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios
Authors: Zhen Xiang, David J. Miller, George Kesidis
Abstract summary: Backdoor attacks (BAs) are an emerging threat to deep neural network classifiers. We propose a detection framework based on BP reverse-engineering and a novel it expected transferability (ET) statistic.
Score: 22.22337220509128
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Backdoor attacks (BAs) are an emerging threat to deep neural network classifiers. A victim classifier will predict to an attacker-desired target class whenever a test sample is embedded with the same backdoor pattern (BP) that was used to poison the classifier's training set. Detecting whether a classifier is backdoor attacked is not easy in practice, especially when the defender is, e.g., a downstream user without access to the classifier's training set. This challenge is addressed here by a reverse-engineering defense (RED), which has been shown to yield state-of-the-art performance in several domains. However, existing REDs are not applicable when there are only {\it two classes} or when {\it multiple attacks} are present. These scenarios are first studied in the current paper, under the practical constraints that the defender neither has access to the classifier's training set nor to supervision from clean reference classifiers trained for the same domain. We propose a detection framework based on BP reverse-engineering and a novel {\it expected transferability} (ET) statistic. We show that our ET statistic is effective {\it using the same detection threshold}, irrespective of the classification domain, the attack configuration, and the BP reverse-engineering algorithm that is used. The excellent performance of our method is demonstrated on six benchmark datasets. Notably, our detection framework is also applicable to multi-class scenarios with multiple attacks.

Related papers

Backdooring Outlier Detection Methods: A Novel Attack Approach [2.19238269573727]
Outlier detection is crucial for deploying classifiers in critical real-world applications. We propose BATOD, a novel Backdoor Attack targeting the Outlier Detection task.
arXiv Detail & Related papers (2024-12-06T13:03:22Z)
AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries. We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots. We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z)
Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection [27.62279831135902]
Deep neural networks are vulnerable toTrojan attacks, where an attacker poisons the training set with backdoor triggers. Recent work shows that backdoor poisoning induces over-fitting (abnormally large activations) in the attacked model. We devise a new such approach, choosing the activation bounds to explicitly limit classification margins.
arXiv Detail & Related papers (2023-08-08T22:47:39Z)
UMD: Unsupervised Model Detection for X2X Backdoor Attacks [16.8197731929139]
Backdoor (Trojan) attack is a common threat to deep neural networks, where samples from one or more source classes embedded with a trigger backdoor will be misclassified to adversarial target classes. We propose Unsupervised Model Detection method that effectively detects X2X backdoor attacks via a joint inference of the adversarial (source, target) class pairs.
arXiv Detail & Related papers (2023-05-29T23:06:05Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary Backdoor Pattern Types Using a Maximum Margin Statistic [27.62279831135902]
We propose a post-training defense that detects backdoor attacks with arbitrary types of backdoor embeddings. Our detector does not need any legitimate clean samples, and can efficiently detect backdoor attacks with arbitrary numbers of source classes.
arXiv Detail & Related papers (2022-05-13T21:32:24Z)
AntidoteRT: Run-time Detection and Correction of Poison Attacks on Neural Networks [18.461079157949698]
backdoor poisoning attacks against image classification networks. We propose lightweight automated detection and correction techniques against poisoning attacks. Our technique outperforms existing defenses such as NeuralCleanse and STRIP on popular benchmarks.
arXiv Detail & Related papers (2022-01-31T23:42:32Z)
Towards A Conceptually Simple Defensive Approach for Few-shot classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks. We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering. Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z)
Detecting Backdoor Attacks Against Point Cloud Classifiers [34.14971037420606]
First BA against point cloud (PC) classifiers was proposed, creating new threats to many important applications including autonomous driving. In this paper, we propose a reverse-engineering defense that infers whether a PC classifier is backdoor attacked, without access to its training set. The effectiveness of our defense is demonstrated on the benchmark ModeNet40 dataset for PCs.
arXiv Detail & Related papers (2021-10-20T03:12:06Z)
Adversarially Robust One-class Novelty Detection [83.1570537254877]
We show that existing novelty detectors are susceptible to adversarial examples. We propose a defense strategy that manipulates the latent space of novelty detectors to improve the robustness against adversarial examples.
arXiv Detail & Related papers (2021-08-25T10:41:29Z)
Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data. We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level. Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
Detection of Adversarial Supports in Few-shot Classifiers Using Feature Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets. We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection. Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.