Related papers: Detecting Adversarial Examples by Input Transformations, Defense Perturbations, and Voting

Detecting Adversarial Examples by Input Transformations, Defense Perturbations, and Voting

URL: http://arxiv.org/abs/2101.11466v1
Date: Wed, 27 Jan 2021 14:50:41 GMT
Title: Detecting Adversarial Examples by Input Transformations, Defense Perturbations, and Voting
Authors: Federico Nesti, Alessandro Biondi, Giorgio Buttazzo
Abstract summary: convolutional neural networks (CNNs) have proved to reach super-human performance in visual recognition tasks. CNNs can easily be fooled by adversarial examples, i.e., maliciously-crafted images that force the networks to predict an incorrect output. This paper extensively explores the detection of adversarial examples via image transformations and proposes a novel methodology.
Score: 71.57324258813674
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Over the last few years, convolutional neural networks (CNNs) have proved to reach super-human performance in visual recognition tasks. However, CNNs can easily be fooled by adversarial examples, i.e., maliciously-crafted images that force the networks to predict an incorrect output while being extremely similar to those for which a correct output is predicted. Regular adversarial examples are not robust to input image transformations, which can then be used to detect whether an adversarial example is presented to the network. Nevertheless, it is still possible to generate adversarial examples that are robust to such transformations. This paper extensively explores the detection of adversarial examples via image transformations and proposes a novel methodology, called \textit{defense perturbation}, to detect robust adversarial examples with the same input transformations the adversarial examples are robust to. Such a \textit{defense perturbation} is shown to be an effective counter-measure to robust adversarial examples. Furthermore, multi-network adversarial examples are introduced. This kind of adversarial examples can be used to simultaneously fool multiple networks, which is critical in systems that use network redundancy, such as those based on architectures with majority voting over multiple CNNs. An extensive set of experiments based on state-of-the-art CNNs trained on the Imagenet dataset is finally reported.

Related papers

TranSegPGD: Improving Transferability of Adversarial Examples on Semantic Segmentation [62.954089681629206]
We propose an effective two-stage adversarial attack strategy to improve the transferability of adversarial examples on semantic segmentation. The proposed adversarial attack method can achieve state-of-the-art performance.
arXiv Detail & Related papers (2023-12-03T00:48:33Z)
A Geometrical Approach to Evaluate the Adversarial Robustness of Deep Neural Networks [52.09243852066406]
Adversarial Converging Time Score (ACTS) measures the converging time as an adversarial robustness metric. We validate the effectiveness and generalization of the proposed ACTS metric against different adversarial attacks on the large-scale ImageNet dataset.
arXiv Detail & Related papers (2023-10-10T09:39:38Z)
Inference Time Evidences of Adversarial Attacks for Forensic on Transformers [27.88746727644074]
Vision Transformers (ViTs) are becoming a popular paradigm for vision tasks as they achieve state-of-the-art performance on image classification. This paper presents our first attempt toward detecting adversarial attacks during inference time using the network's input and outputs as well as latent features.
arXiv Detail & Related papers (2023-01-31T01:17:03Z)
Unfolding Local Growth Rate Estimates for (Almost) Perfect Adversarial Detection [22.99930028876662]
Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. Current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system. We propose a simple and light-weight detector, which leverages recent findings on the relation between networks' local intrinsic dimensionality (LID) and adversarial attacks.
arXiv Detail & Related papers (2022-12-13T17:51:32Z)
Block-Sparse Adversarial Attack to Fool Transformer-Based Text Classifiers [49.50163349643615]
In this paper, we propose a gradient-based adversarial attack against transformer-based text classifiers. Experimental results demonstrate that, while our adversarial attack maintains the semantics of the sentence, it can reduce the accuracy of GPT-2 to less than 5%.
arXiv Detail & Related papers (2022-03-11T14:37:41Z)
Adversarial Examples Detection with Bayesian Neural Network [57.185482121807716]
We propose a new framework to detect adversarial examples motivated by the observations that random components can improve the smoothness of predictors. We propose a novel Bayesian adversarial example detector, short for BATer, to improve the performance of adversarial example detection.
arXiv Detail & Related papers (2021-05-18T15:51:24Z)
Learning Defense Transformers for Counterattacking Adversarial Examples [43.59730044883175]
Deep neural networks (DNNs) are vulnerable to adversarial examples with small perturbations. Existing defense methods focus on some specific types of adversarial examples and may fail to defend well in real-world applications. We study adversarial examples from a new perspective that whether we can defend against adversarial examples by pulling them back to the original clean distribution.
arXiv Detail & Related papers (2021-03-13T02:03:53Z)
SpectralDefense: Detecting Adversarial Attacks on CNNs in the Fourier Domain [10.418647759223964]
We show how analysis in the Fourier domain of input images and feature maps can be used to distinguish benign test samples from adversarial images. We propose two novel detection methods.
arXiv Detail & Related papers (2021-03-04T12:48:28Z)
Error Diffusion Halftoning Against Adversarial Examples [85.11649974840758]
Adversarial examples contain carefully crafted perturbations that can fool deep neural networks into making wrong predictions. We propose a new image transformation defense based on error diffusion halftoning, and combine it with adversarial training to defend against adversarial examples.
arXiv Detail & Related papers (2021-01-23T07:55:02Z)
Adversarial Profiles: Detecting Out-Distribution & Adversarial Samples in Pre-trained CNNs [4.52308938611108]
We propose a method to detect adversarial and out-distribution examples against a pre-trained CNN. To this end, we create adversarial profiles for each class using only one adversarial attack generation technique. Our initial evaluation of this approach using MNIST dataset show that adversarial profile based detection is effective in detecting at least 92 of out-distribution examples and 59% of adversarial examples.
arXiv Detail & Related papers (2020-11-18T07:10:13Z)
On the Transferability of Adversarial Attacksagainst Neural Text Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models. We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models. We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.