Related papers: Adversarial Profiles: Detecting Out-Distribution & Adversarial Samples in Pre-trained CNNs

Adversarial Profiles: Detecting Out-Distribution & Adversarial Samples in Pre-trained CNNs

URL: http://arxiv.org/abs/2011.09123v1
Date: Wed, 18 Nov 2020 07:10:13 GMT
Title: Adversarial Profiles: Detecting Out-Distribution & Adversarial Samples in Pre-trained CNNs
Authors: Arezoo Rajabi, Rakesh B. Bobba
Abstract summary: We propose a method to detect adversarial and out-distribution examples against a pre-trained CNN. To this end, we create adversarial profiles for each class using only one adversarial attack generation technique. Our initial evaluation of this approach using MNIST dataset show that adversarial profile based detection is effective in detecting at least 92 of out-distribution examples and 59% of adversarial examples.
Score: 4.52308938611108
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite high accuracy of Convolutional Neural Networks (CNNs), they are vulnerable to adversarial and out-distribution examples. There are many proposed methods that tend to detect or make CNNs robust against these fooling examples. However, most such methods need access to a wide range of fooling examples to retrain the network or to tune detection parameters. Here, we propose a method to detect adversarial and out-distribution examples against a pre-trained CNN without needing to retrain the CNN or needing access to a wide variety of fooling examples. To this end, we create adversarial profiles for each class using only one adversarial attack generation technique. We then wrap a detector around the pre-trained CNN that applies the created adversarial profile to each input and uses the output to decide whether or not the input is legitimate. Our initial evaluation of this approach using MNIST dataset show that adversarial profile based detection is effective in detecting at least 92 of out-distribution examples and 59% of adversarial examples.

Related papers

CausAdv: A Causal-based Framework for Detecting Adversarial Examples [0.0]
Convolutional neural networks (CNNs) are vulnerable to crafted adversarial perturbations in inputs. These inputs appear almost indistinguishable from natural images, yet they are incorrectly classified by CNN architectures. We propose CausAdv: a causal framework for detecting adversarial examples based on counterfactual reasoning.
arXiv Detail & Related papers (2024-10-29T22:57:48Z)
Unfolding Local Growth Rate Estimates for (Almost) Perfect Adversarial Detection [22.99930028876662]
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. Current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system. We propose a simple and light-weight detector, which leverages recent findings on the relation between networks' local intrinsic dimensionality (LID) and adversarial attacks.
arXiv Detail & Related papers (2022-12-13T17:51:32Z)
Trace and Detect Adversarial Attacks on CNNs using Feature Response Maps [0.3437656066916039]
adversarial attacks on convolutional neural networks (CNN) In this work, we propose a novel detection method for adversarial examples to prevent attacks. We do so by tracking adversarial perturbations in feature responses, allowing for automatic detection using average local spatial entropy.
arXiv Detail & Related papers (2022-08-24T11:05:04Z)
Detect and Defense Against Adversarial Examples in Deep Learning using Natural Scene Statistics and Adaptive Denoising [12.378017309516965]
We propose a framework for defending DNN against ad-versarial samples. The detector aims to detect AEs bycharacterizing them through the use of natural scenestatistic. The proposed method outperforms the state-of-the-art defense techniques.
arXiv Detail & Related papers (2021-07-12T23:45:44Z)
Adversarial Examples Detection with Bayesian Neural Network [57.185482121807716]
We propose a new framework to detect adversarial examples motivated by the observations that random components can improve the smoothness of predictors. We propose a novel Bayesian adversarial example detector, short for BATer, to improve the performance of adversarial example detection.
arXiv Detail & Related papers (2021-05-18T15:51:24Z)
BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks. Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z)
Detecting Adversarial Examples by Input Transformations, Defense Perturbations, and Voting [71.57324258813674]
convolutional neural networks (CNNs) have proved to reach super-human performance in visual recognition tasks. CNNs can easily be fooled by adversarial examples, i.e., maliciously-crafted images that force the networks to predict an incorrect output. This paper extensively explores the detection of adversarial examples via image transformations and proposes a novel methodology.
arXiv Detail & Related papers (2021-01-27T14:50:41Z)
Learning to Separate Clusters of Adversarial Representations for Robust Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature. In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property. This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z)
Anomaly Detection-Based Unknown Face Presentation Attack Detection [74.4918294453537]
Anomaly detection-based spoof attack detection is a recent development in face Presentation Attack Detection. In this paper, we present a deep-learning solution for anomaly detection-based spoof attack detection. The proposed approach benefits from the representation learning power of the CNNs and learns better features for fPAD task.
arXiv Detail & Related papers (2020-07-11T21:20:55Z)
Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood Ensemble [163.3333439344695]
Dirichlet Neighborhood Ensemble (DNE) is a randomized smoothing method for training a robust model to defense substitution-based attacks. DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data. We demonstrate through extensive experimentation that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
arXiv Detail & Related papers (2020-06-20T18:01:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.