Adversarial Profiles: Detecting Out-Distribution & Adversarial Samples
in Pre-trained CNNs
- URL: http://arxiv.org/abs/2011.09123v1
- Date: Wed, 18 Nov 2020 07:10:13 GMT
- Title: Adversarial Profiles: Detecting Out-Distribution & Adversarial Samples
in Pre-trained CNNs
- Authors: Arezoo Rajabi, Rakesh B. Bobba
- Abstract summary: We propose a method to detect adversarial and out-distribution examples against a pre-trained CNN.
To this end, we create adversarial profiles for each class using only one adversarial attack generation technique.
Our initial evaluation of this approach using MNIST dataset show that adversarial profile based detection is effective in detecting at least 92 of out-distribution examples and 59% of adversarial examples.
- Score: 4.52308938611108
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite high accuracy of Convolutional Neural Networks (CNNs), they are
vulnerable to adversarial and out-distribution examples. There are many
proposed methods that tend to detect or make CNNs robust against these fooling
examples. However, most such methods need access to a wide range of fooling
examples to retrain the network or to tune detection parameters. Here, we
propose a method to detect adversarial and out-distribution examples against a
pre-trained CNN without needing to retrain the CNN or needing access to a wide
variety of fooling examples. To this end, we create adversarial profiles for
each class using only one adversarial attack generation technique. We then wrap
a detector around the pre-trained CNN that applies the created adversarial
profile to each input and uses the output to decide whether or not the input is
legitimate. Our initial evaluation of this approach using MNIST dataset show
that adversarial profile based detection is effective in detecting at least 92
of out-distribution examples and 59% of adversarial examples.
Related papers
- CausAdv: A Causal-based Framework for Detecting Adversarial Examples [0.0]
Convolutional neural networks (CNNs) are vulnerable to crafted adversarial perturbations in inputs.
These inputs appear almost indistinguishable from natural images, yet they are incorrectly classified by CNN architectures.
We propose CausAdv: a causal framework for detecting adversarial examples based on counterfactual reasoning.
arXiv Detail & Related papers (2024-10-29T22:57:48Z) - Unfolding Local Growth Rate Estimates for (Almost) Perfect Adversarial
Detection [22.99930028876662]
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks.
Current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system.
We propose a simple and light-weight detector, which leverages recent findings on the relation between networks' local intrinsic dimensionality (LID) and adversarial attacks.
arXiv Detail & Related papers (2022-12-13T17:51:32Z) - Trace and Detect Adversarial Attacks on CNNs using Feature Response Maps [0.3437656066916039]
adversarial attacks on convolutional neural networks (CNN)
In this work, we propose a novel detection method for adversarial examples to prevent attacks.
We do so by tracking adversarial perturbations in feature responses, allowing for automatic detection using average local spatial entropy.
arXiv Detail & Related papers (2022-08-24T11:05:04Z) - Detect and Defense Against Adversarial Examples in Deep Learning using
Natural Scene Statistics and Adaptive Denoising [12.378017309516965]
We propose a framework for defending DNN against ad-versarial samples.
The detector aims to detect AEs bycharacterizing them through the use of natural scenestatistic.
The proposed method outperforms the state-of-the-art defense techniques.
arXiv Detail & Related papers (2021-07-12T23:45:44Z) - Adversarial Examples Detection with Bayesian Neural Network [57.185482121807716]
We propose a new framework to detect adversarial examples motivated by the observations that random components can improve the smoothness of predictors.
We propose a novel Bayesian adversarial example detector, short for BATer, to improve the performance of adversarial example detection.
arXiv Detail & Related papers (2021-05-18T15:51:24Z) - BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by
Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks.
Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z) - Detecting Adversarial Examples by Input Transformations, Defense
Perturbations, and Voting [71.57324258813674]
convolutional neural networks (CNNs) have proved to reach super-human performance in visual recognition tasks.
CNNs can easily be fooled by adversarial examples, i.e., maliciously-crafted images that force the networks to predict an incorrect output.
This paper extensively explores the detection of adversarial examples via image transformations and proposes a novel methodology.
arXiv Detail & Related papers (2021-01-27T14:50:41Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z) - Anomaly Detection-Based Unknown Face Presentation Attack Detection [74.4918294453537]
Anomaly detection-based spoof attack detection is a recent development in face Presentation Attack Detection.
In this paper, we present a deep-learning solution for anomaly detection-based spoof attack detection.
The proposed approach benefits from the representation learning power of the CNNs and learns better features for fPAD task.
arXiv Detail & Related papers (2020-07-11T21:20:55Z) - Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood
Ensemble [163.3333439344695]
Dirichlet Neighborhood Ensemble (DNE) is a randomized smoothing method for training a robust model to defense substitution-based attacks.
DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data.
We demonstrate through extensive experimentation that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
arXiv Detail & Related papers (2020-06-20T18:01:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.