Detecting Adversarial Examples by Input Transformations, Defense
Perturbations, and Voting
- URL: http://arxiv.org/abs/2101.11466v1
- Date: Wed, 27 Jan 2021 14:50:41 GMT
- Title: Detecting Adversarial Examples by Input Transformations, Defense
Perturbations, and Voting
- Authors: Federico Nesti, Alessandro Biondi, Giorgio Buttazzo
- Abstract summary: convolutional neural networks (CNNs) have proved to reach super-human performance in visual recognition tasks.
CNNs can easily be fooled by adversarial examples, i.e., maliciously-crafted images that force the networks to predict an incorrect output.
This paper extensively explores the detection of adversarial examples via image transformations and proposes a novel methodology.
- Score: 71.57324258813674
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Over the last few years, convolutional neural networks (CNNs) have proved to
reach super-human performance in visual recognition tasks. However, CNNs can
easily be fooled by adversarial examples, i.e., maliciously-crafted images that
force the networks to predict an incorrect output while being extremely similar
to those for which a correct output is predicted. Regular adversarial examples
are not robust to input image transformations, which can then be used to detect
whether an adversarial example is presented to the network. Nevertheless, it is
still possible to generate adversarial examples that are robust to such
transformations.
This paper extensively explores the detection of adversarial examples via
image transformations and proposes a novel methodology, called \textit{defense
perturbation}, to detect robust adversarial examples with the same input
transformations the adversarial examples are robust to. Such a \textit{defense
perturbation} is shown to be an effective counter-measure to robust adversarial
examples.
Furthermore, multi-network adversarial examples are introduced. This kind of
adversarial examples can be used to simultaneously fool multiple networks,
which is critical in systems that use network redundancy, such as those based
on architectures with majority voting over multiple CNNs. An extensive set of
experiments based on state-of-the-art CNNs trained on the Imagenet dataset is
finally reported.
Related papers
- A Geometrical Approach to Evaluate the Adversarial Robustness of Deep
Neural Networks [52.09243852066406]
Adversarial Converging Time Score (ACTS) measures the converging time as an adversarial robustness metric.
We validate the effectiveness and generalization of the proposed ACTS metric against different adversarial attacks on the large-scale ImageNet dataset.
arXiv Detail & Related papers (2023-10-10T09:39:38Z) - Unfolding Local Growth Rate Estimates for (Almost) Perfect Adversarial
Detection [22.99930028876662]
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks.
Current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system.
We propose a simple and light-weight detector, which leverages recent findings on the relation between networks' local intrinsic dimensionality (LID) and adversarial attacks.
arXiv Detail & Related papers (2022-12-13T17:51:32Z) - Block-Sparse Adversarial Attack to Fool Transformer-Based Text
Classifiers [49.50163349643615]
In this paper, we propose a gradient-based adversarial attack against transformer-based text classifiers.
Experimental results demonstrate that, while our adversarial attack maintains the semantics of the sentence, it can reduce the accuracy of GPT-2 to less than 5%.
arXiv Detail & Related papers (2022-03-11T14:37:41Z) - Adversarial Examples Detection with Bayesian Neural Network [57.185482121807716]
We propose a new framework to detect adversarial examples motivated by the observations that random components can improve the smoothness of predictors.
We propose a novel Bayesian adversarial example detector, short for BATer, to improve the performance of adversarial example detection.
arXiv Detail & Related papers (2021-05-18T15:51:24Z) - Learning Defense Transformers for Counterattacking Adversarial Examples [43.59730044883175]
Deep neural networks (DNNs) are vulnerable to adversarial examples with small perturbations.
Existing defense methods focus on some specific types of adversarial examples and may fail to defend well in real-world applications.
We study adversarial examples from a new perspective that whether we can defend against adversarial examples by pulling them back to the original clean distribution.
arXiv Detail & Related papers (2021-03-13T02:03:53Z) - SpectralDefense: Detecting Adversarial Attacks on CNNs in the Fourier
Domain [10.418647759223964]
We show how analysis in the Fourier domain of input images and feature maps can be used to distinguish benign test samples from adversarial images.
We propose two novel detection methods.
arXiv Detail & Related papers (2021-03-04T12:48:28Z) - Error Diffusion Halftoning Against Adversarial Examples [85.11649974840758]
Adversarial examples contain carefully crafted perturbations that can fool deep neural networks into making wrong predictions.
We propose a new image transformation defense based on error diffusion halftoning, and combine it with adversarial training to defend against adversarial examples.
arXiv Detail & Related papers (2021-01-23T07:55:02Z) - Adversarial Profiles: Detecting Out-Distribution & Adversarial Samples
in Pre-trained CNNs [4.52308938611108]
We propose a method to detect adversarial and out-distribution examples against a pre-trained CNN.
To this end, we create adversarial profiles for each class using only one adversarial attack generation technique.
Our initial evaluation of this approach using MNIST dataset show that adversarial profile based detection is effective in detecting at least 92 of out-distribution examples and 59% of adversarial examples.
arXiv Detail & Related papers (2020-11-18T07:10:13Z) - On the Transferability of Adversarial Attacksagainst Neural Text
Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models.
We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models.
We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.