EAD: an ensemble approach to detect adversarial examples from the hidden
features of deep neural networks
- URL: http://arxiv.org/abs/2111.12631v2
- Date: Thu, 25 Nov 2021 11:24:28 GMT
- Title: EAD: an ensemble approach to detect adversarial examples from the hidden
features of deep neural networks
- Authors: Francesco Craighero, Fabrizio Angaroni, Fabio Stella, Chiara Damiani,
Marco Antoniotti, Alex Graudenzi
- Abstract summary: We propose an Ensemble Adversarial Detector (EAD) for the identification of adversarial examples.
EAD combines multiple detectors that exploit distinct properties of the input instances in the internal representation of a pre-trained Deep Neural Network (DNN)
We show that EAD achieves the best AUROC and AUPR in the large majority of the settings and comparable performance in the others.
- Score: 1.3212032015497979
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the key challenges in Deep Learning is the definition of effective
strategies for the detection of adversarial examples. To this end, we propose a
novel approach named Ensemble Adversarial Detector (EAD) for the identification
of adversarial examples, in a standard multiclass classification scenario. EAD
combines multiple detectors that exploit distinct properties of the input
instances in the internal representation of a pre-trained Deep Neural Network
(DNN). Specifically, EAD integrates the state-of-the-art detectors based on
Mahalanobis distance and on Local Intrinsic Dimensionality (LID) with a newly
introduced method based on One-class Support Vector Machines (OSVMs). Although
all constituting methods assume that the greater the distance of a test
instance from the set of correctly classified training instances, the higher
its probability to be an adversarial example, they differ in the way such
distance is computed. In order to exploit the effectiveness of the different
methods in capturing distinct properties of data distributions and,
accordingly, efficiently tackle the trade-off between generalization and
overfitting, EAD employs detector-specific distance scores as features of a
logistic regression classifier, after independent hyperparameters optimization.
We evaluated the EAD approach on distinct datasets (CIFAR-10, CIFAR-100 and
SVHN) and models (ResNet and DenseNet) and with regard to four adversarial
attacks (FGSM, BIM, DeepFool and CW), also by comparing with competing
approaches. Overall, we show that EAD achieves the best AUROC and AUPR in the
large majority of the settings and comparable performance in the others. The
improvement over the state-of-the-art, and the possibility to easily extend EAD
to include any arbitrary set of detectors, pave the way to a widespread
adoption of ensemble approaches in the broad field of adversarial example
detection.
Related papers
- Self-Supervised Representation Learning for Adversarial Attack Detection [6.528181610035978]
Supervised learning-based adversarial attack detection methods rely on a large number of labeled data.
We propose a self-supervised representation learning framework for the adversarial attack detection task to address this drawback.
arXiv Detail & Related papers (2024-07-05T09:37:16Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Benchmarking Deep Models for Salient Object Detection [67.07247772280212]
We construct a general SALient Object Detection (SALOD) benchmark to conduct a comprehensive comparison among several representative SOD methods.
In the above experiments, we find that existing loss functions usually specialized in some metrics but reported inferior results on the others.
We propose a novel Edge-Aware (EA) loss that promotes deep networks to learn more discriminative features by integrating both pixel- and image-level supervision signals.
arXiv Detail & Related papers (2022-02-07T03:43:16Z) - PARL: Enhancing Diversity of Ensemble Networks to Resist Adversarial
Attacks via Pairwise Adversarially Robust Loss Function [13.417003144007156]
adversarial attacks tend to rely on the principle of transferability.
Ensemble methods against adversarial attacks demonstrate that an adversarial example is less likely to mislead multiple classifiers.
Recent ensemble methods have either been shown to be vulnerable to stronger adversaries or shown to lack an end-to-end evaluation.
arXiv Detail & Related papers (2021-12-09T14:26:13Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Learning to Detect Adversarial Examples Based on Class Scores [0.8411385346896413]
We take a closer look at adversarial attack detection based on the class scores of an already trained classification model.
We propose to train a support vector machine (SVM) on the class scores to detect adversarial examples.
We show that our approach yields an improved detection rate compared to an existing method, whilst being easy to implement.
arXiv Detail & Related papers (2021-07-09T13:29:54Z) - Self-Supervised Adversarial Example Detection by Disentangled
Representation [16.98476232162835]
We train an autoencoder, assisted by a discriminator network, over both correctly paired class/semantic features and incorrectly paired class/semantic features to reconstruct benign and counterexamples.
This mimics the behavior of adversarial examples and can reduce the unnecessary generalization ability of autoencoder.
Compared with the state-of-the-art self-supervised detection methods, our method exhibits better performance in various measurements.
arXiv Detail & Related papers (2021-05-08T12:48:18Z) - Adversarial Unsupervised Domain Adaptation Guided with Deep Clustering
for Face Presentation Attack Detection [0.8701566919381223]
Face Presentation Attack Detection (PAD) has drawn increasing attentions to secure the face recognition systems.
We propose an end-to-end learning framework based on Domain Adaptation (DA) to improve PAD generalization capability.
arXiv Detail & Related papers (2021-02-13T05:34:40Z) - Detection of Adversarial Supports in Few-shot Classifiers Using Feature
Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets.
We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection.
Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z) - Cross-domain Face Presentation Attack Detection via Multi-domain
Disentangled Representation Learning [109.42987031347582]
Face presentation attack detection (PAD) has been an urgent problem to be solved in the face recognition systems.
We propose an efficient disentangled representation learning for cross-domain face PAD.
Our approach consists of disentangled representation learning (DR-Net) and multi-domain learning (MD-Net)
arXiv Detail & Related papers (2020-04-04T15:45:14Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.