UMD: Unsupervised Model Detection for X2X Backdoor Attacks
- URL: http://arxiv.org/abs/2305.18651v4
- Date: Wed, 15 Nov 2023 21:51:23 GMT
- Title: UMD: Unsupervised Model Detection for X2X Backdoor Attacks
- Authors: Zhen Xiang, Zidi Xiong, Bo Li
- Abstract summary: Backdoor (Trojan) attack is a common threat to deep neural networks, where samples from one or more source classes embedded with a trigger backdoor will be misclassified to adversarial target classes.
We propose Unsupervised Model Detection method that effectively detects X2X backdoor attacks via a joint inference of the adversarial (source, target) class pairs.
- Score: 16.8197731929139
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Backdoor (Trojan) attack is a common threat to deep neural networks, where
samples from one or more source classes embedded with a backdoor trigger will
be misclassified to adversarial target classes. Existing methods for detecting
whether a classifier is backdoor attacked are mostly designed for attacks with
a single adversarial target (e.g., all-to-one attack). To the best of our
knowledge, without supervision, no existing methods can effectively address the
more general X2X attack with an arbitrary number of source classes, each paired
with an arbitrary target class. In this paper, we propose UMD, the first
Unsupervised Model Detection method that effectively detects X2X backdoor
attacks via a joint inference of the adversarial (source, target) class pairs.
In particular, we first define a novel transferability statistic to measure and
select a subset of putative backdoor class pairs based on a proposed clustering
approach. Then, these selected class pairs are jointly assessed based on an
aggregation of their reverse-engineered trigger size for detection inference,
using a robust and unsupervised anomaly detector we proposed. We conduct
comprehensive evaluations on CIFAR-10, GTSRB, and Imagenette dataset, and show
that our unsupervised UMD outperforms SOTA detectors (even with supervision) by
17%, 4%, and 8%, respectively, in terms of the detection accuracy against
diverse X2X attacks. We also show the strong detection performance of UMD
against several strong adaptive attacks.
Related papers
- Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection [83.72430401516674]
GAKer is able to construct adversarial examples to any target class.
Our method achieves an approximately $14.13%$ higher attack success rate for unknown classes.
arXiv Detail & Related papers (2024-07-17T03:24:09Z) - Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked.
We propose an attack-agnostic defense method named Meta Invariance Defense (MID)
We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z) - Malicious Agent Detection for Robust Multi-Agent Collaborative Perception [52.261231738242266]
Multi-agent collaborative (MAC) perception is more vulnerable to adversarial attacks than single-agent perception.
We propose Malicious Agent Detection (MADE), a reactive defense specific to MAC perception.
We conduct comprehensive evaluations on a benchmark 3D dataset V2X-sim and a real-road dataset DAIR-V2X.
arXiv Detail & Related papers (2023-10-18T11:36:42Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary
Backdoor Pattern Types Using a Maximum Margin Statistic [27.62279831135902]
We propose a post-training defense that detects backdoor attacks with arbitrary types of backdoor embeddings.
Our detector does not need any legitimate clean samples, and can efficiently detect backdoor attacks with arbitrary numbers of source classes.
arXiv Detail & Related papers (2022-05-13T21:32:24Z) - Post-Training Detection of Backdoor Attacks for Two-Class and
Multi-Attack Scenarios [22.22337220509128]
Backdoor attacks (BAs) are an emerging threat to deep neural network classifiers.
We propose a detection framework based on BP reverse-engineering and a novel it expected transferability (ET) statistic.
arXiv Detail & Related papers (2022-01-20T22:21:38Z) - Learning to Detect Adversarial Examples Based on Class Scores [0.8411385346896413]
We take a closer look at adversarial attack detection based on the class scores of an already trained classification model.
We propose to train a support vector machine (SVM) on the class scores to detect adversarial examples.
We show that our approach yields an improved detection rate compared to an existing method, whilst being easy to implement.
arXiv Detail & Related papers (2021-07-09T13:29:54Z) - ExAD: An Ensemble Approach for Explanation-based Adversarial Detection [17.455233006559734]
We propose ExAD, a framework to detect adversarial examples using an ensemble of explanation techniques.
We evaluate our approach using six state-of-the-art adversarial attacks on three image datasets.
arXiv Detail & Related papers (2021-03-22T00:53:07Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Anomaly Detection-Based Unknown Face Presentation Attack Detection [74.4918294453537]
Anomaly detection-based spoof attack detection is a recent development in face Presentation Attack Detection.
In this paper, we present a deep-learning solution for anomaly detection-based spoof attack detection.
The proposed approach benefits from the representation learning power of the CNNs and learns better features for fPAD task.
arXiv Detail & Related papers (2020-07-11T21:20:55Z) - Adversarial Detection and Correction by Matching Prediction
Distributions [0.0]
The detector almost completely neutralises powerful attacks like Carlini-Wagner or SLIDE on MNIST and Fashion-MNIST.
We show that our method is still able to detect the adversarial examples in the case of a white-box attack where the attacker has full knowledge of both the model and the defence.
arXiv Detail & Related papers (2020-02-21T15:45:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.