Deep neural network loses attention to adversarial images
- URL: http://arxiv.org/abs/2106.05657v1
- Date: Thu, 10 Jun 2021 11:06:17 GMT
- Title: Deep neural network loses attention to adversarial images
- Authors: Shashank Kotyan and Danilo Vasconcellos Vargas
- Abstract summary: Adversarial algorithms have shown to be effective against neural networks for a variety of tasks.
We show that in the case of Pixel Attack, perturbed pixels call the network attention to themselves or divert the attention from them.
We also show that both attacks affect the saliency map and activation maps differently.
- Score: 11.650381752104296
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial algorithms have shown to be effective against neural networks for
a variety of tasks. Some adversarial algorithms perturb all the pixels in the
image minimally for the image classification task in image classification. In
contrast, some algorithms perturb few pixels strongly. However, very little
information is available regarding why these adversarial samples so diverse
from each other exist. Recently, Vargas et al. showed that the existence of
these adversarial samples might be due to conflicting saliency within the
neural network. We test this hypothesis of conflicting saliency by analysing
the Saliency Maps (SM) and Gradient-weighted Class Activation Maps (Grad-CAM)
of original and few different types of adversarial samples. We also analyse how
different adversarial samples distort the attention of the neural network
compared to original samples. We show that in the case of Pixel Attack,
perturbed pixels either calls the network attention to themselves or divert the
attention from them. Simultaneously, the Projected Gradient Descent Attack
perturbs pixels so that intermediate layers inside the neural network lose
attention for the correct class. We also show that both attacks affect the
saliency map and activation maps differently. Thus, shedding light on why some
defences successful against some attacks remain vulnerable against other
attacks. We hope that this analysis will improve understanding of the existence
and the effect of adversarial samples and enable the community to develop more
robust neural networks.
Related papers
- Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets.
We identify human-identifiable features in adversarial perturbations.
Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z) - Identification of Attack-Specific Signatures in Adversarial Examples [62.17639067715379]
We show that different attack algorithms produce adversarial examples which are distinct not only in their effectiveness but also in how they qualitatively affect their victims.
Our findings suggest that prospective adversarial attacks should be compared not only via their success rates at fooling models but also via deeper downstream effects they have on victims.
arXiv Detail & Related papers (2021-10-13T15:40:48Z) - Attack to Fool and Explain Deep Networks [59.97135687719244]
We counter-argue by providing evidence of human-meaningful patterns in adversarial perturbations.
Our major contribution is a novel pragmatic adversarial attack that is subsequently transformed into a tool to interpret the visual models.
arXiv Detail & Related papers (2021-06-20T03:07:36Z) - Chromatic and spatial analysis of one-pixel attacks against an image
classifier [0.0]
This research presents ways to analyze chromatic and spatial distributions of one-pixel attacks.
We show that the more effective attacks change the color of the pixel more, and that the successful attacks are situated at the center of the images.
arXiv Detail & Related papers (2021-05-28T12:21:58Z) - Black-box adversarial attacks using Evolution Strategies [3.093890460224435]
We study the generation of black-box adversarial attacks for image classification tasks.
Our results show that the attacked neural networks can be, in most cases, easily fooled by all the algorithms under comparison.
Some black-box optimization algorithms may be better in "harder" setups, both in terms of attack success rate and efficiency.
arXiv Detail & Related papers (2021-04-30T15:33:07Z) - BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by
Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks.
Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z) - SpectralDefense: Detecting Adversarial Attacks on CNNs in the Fourier
Domain [10.418647759223964]
We show how analysis in the Fourier domain of input images and feature maps can be used to distinguish benign test samples from adversarial images.
We propose two novel detection methods.
arXiv Detail & Related papers (2021-03-04T12:48:28Z) - GreedyFool: Distortion-Aware Sparse Adversarial Attack [138.55076781355206]
Modern deep neural networks (DNNs) are vulnerable to adversarial samples.
Sparse adversarial samples can fool the target model by only perturbing a few pixels.
We propose a novel two-stage distortion-aware greedy-based method dubbed as "GreedyFool"
arXiv Detail & Related papers (2020-10-26T17:59:07Z) - Towards Achieving Adversarial Robustness by Enforcing Feature
Consistency Across Bit Planes [51.31334977346847]
We train networks to form coarse impressions based on the information in higher bit planes, and use the lower bit planes only to refine their prediction.
We demonstrate that, by imposing consistency on the representations learned across differently quantized images, the adversarial robustness of networks improves significantly.
arXiv Detail & Related papers (2020-04-01T09:31:10Z) - Recurrent Attention Model with Log-Polar Mapping is Robust against
Adversarial Attacks [0.0]
We develop a novel artificial neural network model that recurrently collects data with a log-polar field of view controlled by attention.
We demonstrate the effectiveness of this design as a defense against SPSA and PGD adversarial attacks.
arXiv Detail & Related papers (2020-02-13T08:40:48Z) - Adversarial Attacks on Convolutional Neural Networks in Facial
Recognition Domain [2.4704085162861693]
Adversarial attacks that render Deep Neural Network (DNN) classifiers vulnerable in real life represent a serious threat in autonomous vehicles, malware filters, or biometric authentication systems.
We apply Fast Gradient Sign Method to introduce perturbations to a facial image dataset and then test the output on a different classifier.
We craft a variety of different black-box attack algorithms on a facial image dataset assuming minimal adversarial knowledge.
arXiv Detail & Related papers (2020-01-30T00:25:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.