ExAD: An Ensemble Approach for Explanation-based Adversarial Detection
- URL: http://arxiv.org/abs/2103.11526v1
- Date: Mon, 22 Mar 2021 00:53:07 GMT
- Title: ExAD: An Ensemble Approach for Explanation-based Adversarial Detection
- Authors: Raj Vardhan, Ninghao Liu, Phakpoom Chinprutthiwong, Weijie Fu, Zhenyu
Hu, Xia Ben Hu, Guofei Gu
- Abstract summary: We propose ExAD, a framework to detect adversarial examples using an ensemble of explanation techniques.
We evaluate our approach using six state-of-the-art adversarial attacks on three image datasets.
- Score: 17.455233006559734
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent research has shown Deep Neural Networks (DNNs) to be vulnerable to
adversarial examples that induce desired misclassifications in the models. Such
risks impede the application of machine learning in security-sensitive domains.
Several defense methods have been proposed against adversarial attacks to
detect adversarial examples at test time or to make machine learning models
more robust. However, while existing methods are quite effective under blackbox
threat model, where the attacker is not aware of the defense, they are
relatively ineffective under whitebox threat model, where the attacker has full
knowledge of the defense.
In this paper, we propose ExAD, a framework to detect adversarial examples
using an ensemble of explanation techniques. Each explanation technique in ExAD
produces an explanation map identifying the relevance of input variables for
the model's classification. For every class in a dataset, the system includes a
detector network, corresponding to each explanation technique, which is trained
to distinguish between normal and abnormal explanation maps. At test time, if
the explanation map of an input is detected as abnormal by any detector model
of the classified class, then we consider the input to be an adversarial
example. We evaluate our approach using six state-of-the-art adversarial
attacks on three image datasets. Our extensive evaluation shows that our
mechanism can effectively detect these attacks under blackbox threat model with
limited false-positives. Furthermore, we find that our approach achieves
promising results in limiting the success rate of whitebox attacks.
Related papers
- AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries.
We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots.
We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z) - Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack [53.032801921915436]
Human Activity Recognition (HAR) has been employed in a wide range of applications, e.g. self-driving cars.
Recently, the robustness of skeleton-based HAR methods have been questioned due to their vulnerability to adversarial attacks.
We show such threats exist, even when the attacker only has access to the input/output of the model.
We propose the very first black-box adversarial attack approach in skeleton-based HAR called BASAR.
arXiv Detail & Related papers (2022-11-21T09:51:28Z) - RamBoAttack: A Robust Query Efficient Deep Neural Network Decision
Exploit [9.93052896330371]
We develop a robust query efficient attack capable of avoiding entrapment in a local minimum and misdirection from noisy gradients.
The RamBoAttack is more robust to the different sample inputs available to an adversary and the targeted class.
arXiv Detail & Related papers (2021-12-10T01:25:24Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Learning to Detect Adversarial Examples Based on Class Scores [0.8411385346896413]
We take a closer look at adversarial attack detection based on the class scores of an already trained classification model.
We propose to train a support vector machine (SVM) on the class scores to detect adversarial examples.
We show that our approach yields an improved detection rate compared to an existing method, whilst being easy to implement.
arXiv Detail & Related papers (2021-07-09T13:29:54Z) - BAARD: Blocking Adversarial Examples by Testing for Applicability,
Reliability and Decidability [12.079529913120593]
Adversarial defenses protect machine learning models from adversarial attacks, but are often tailored to one type of model or attack.
We take inspiration from the concept of Applicability Domain in cheminformatics.
We propose a simple yet robust triple-stage data-driven framework that checks the input globally and locally.
arXiv Detail & Related papers (2021-05-02T15:24:33Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z) - Detection Defense Against Adversarial Attacks with Saliency Map [7.736844355705379]
It is well established that neural networks are vulnerable to adversarial examples, which are almost imperceptible on human vision.
Existing defenses are trend to harden the robustness of models against adversarial attacks.
We propose a novel method combined with additional noises and utilize the inconsistency strategy to detect adversarial examples.
arXiv Detail & Related papers (2020-09-06T13:57:17Z) - Anomaly Detection-Based Unknown Face Presentation Attack Detection [74.4918294453537]
Anomaly detection-based spoof attack detection is a recent development in face Presentation Attack Detection.
In this paper, we present a deep-learning solution for anomaly detection-based spoof attack detection.
The proposed approach benefits from the representation learning power of the CNNs and learns better features for fPAD task.
arXiv Detail & Related papers (2020-07-11T21:20:55Z) - Adversarial Detection and Correction by Matching Prediction
Distributions [0.0]
The detector almost completely neutralises powerful attacks like Carlini-Wagner or SLIDE on MNIST and Fashion-MNIST.
We show that our method is still able to detect the adversarial examples in the case of a white-box attack where the attacker has full knowledge of both the model and the defence.
arXiv Detail & Related papers (2020-02-21T15:45:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.