NoiseCAM: Explainable AI for the Boundary Between Noise and Adversarial
Attacks
- URL: http://arxiv.org/abs/2303.06151v1
- Date: Thu, 9 Mar 2023 22:07:41 GMT
- Title: NoiseCAM: Explainable AI for the Boundary Between Noise and Adversarial
Attacks
- Authors: Wenkai Tan, Justus Renkhoff, Alvaro Velasquez, Ziyu Wang, Lusi Li,
Jian Wang, Shuteng Niu, Fan Yang, Yongxin Liu, Houbing Song
- Abstract summary: adversarial attacks can easily mislead a neural network and lead to wrong decisions.
In this paper, we use the gradient class activation map (GradCAM) to analyze the behavior deviation of the VGG-16 network.
We also propose a novel NoiseCAM algorithm that integrates information from globally and pixel-level weighted class activation maps.
- Score: 21.86821880164293
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Deep Learning (DL) and Deep Neural Networks (DNNs) are widely used in various
domains. However, adversarial attacks can easily mislead a neural network and
lead to wrong decisions. Defense mechanisms are highly preferred in
safety-critical applications. In this paper, firstly, we use the gradient class
activation map (GradCAM) to analyze the behavior deviation of the VGG-16
network when its inputs are mixed with adversarial perturbation or Gaussian
noise. In particular, our method can locate vulnerable layers that are
sensitive to adversarial perturbation and Gaussian noise. We also show that the
behavior deviation of vulnerable layers can be used to detect adversarial
examples. Secondly, we propose a novel NoiseCAM algorithm that integrates
information from globally and pixel-level weighted class activation maps. Our
algorithm is susceptible to adversarial perturbations and will not respond to
Gaussian random noise mixed in the inputs. Third, we compare detecting
adversarial examples using both behavior deviation and NoiseCAM, and we show
that NoiseCAM outperforms behavior deviation modeling in its overall
performance. Our work could provide a useful tool to defend against certain
adversarial attacks on deep neural networks.
Related papers
- Discerning the Chaos: Detecting Adversarial Perturbations while Disentangling Intentional from Unintentional Noises [41.57633238074266]
This paper introduces CIAI, a Class-Independent Adversarial Intent detection network built on a modified vision transformer with detection layers.
CIAI employs a novel loss function that combines Maximum Mean Discrepancy and Center Loss to detect both intentional (adversarial attacks) and unintentional noise.
arXiv Detail & Related papers (2024-09-29T09:10:43Z) - Exploring Adversarial Attacks on Neural Networks: An Explainable
Approach [18.063187159491182]
We analyze the response characteristics of the VGG-16 model when the input images are mixed with adversarial noise and statistically similar Gaussian random noise.
Our work could provide valuable insights into developing more reliable Deep Neural Network (DNN) models.
arXiv Detail & Related papers (2023-03-08T07:59:44Z) - Beyond Pretrained Features: Noisy Image Modeling Provides Adversarial
Defense [52.66971714830943]
Masked image modeling (MIM) has made it a prevailing framework for self-supervised visual representation learning.
In this paper, we investigate how this powerful self-supervised learning paradigm can provide adversarial robustness to downstream classifiers.
We propose an adversarial defense method, referred to as De3, by exploiting the pretrained decoder for denoising.
arXiv Detail & Related papers (2023-02-02T12:37:24Z) - Detecting Adversaries, yet Faltering to Noise? Leveraging Conditional
Variational AutoEncoders for Adversary Detection in the Presence of Noisy
Images [0.7734726150561086]
Conditional Variational AutoEncoders (CVAE) are surprisingly good at detecting imperceptible image perturbations.
We show how CVAEs can be effectively used to detect adversarial attacks on image classification networks.
arXiv Detail & Related papers (2021-11-28T20:36:27Z) - On Procedural Adversarial Noise Attack And Defense [2.5388455804357952]
adversarial examples would inveigle neural networks to make prediction errors with small per- turbations on the input images.
In this paper, we propose two universal adversarial perturbation (UAP) generation methods based on procedural noise functions.
Without changing the semantic representations, the adversarial examples generated via our methods show superior performance on the attack.
arXiv Detail & Related papers (2021-08-10T02:47:01Z) - Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation.
ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations.
The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z) - DAAIN: Detection of Anomalous and Adversarial Input using Normalizing
Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA)
Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution.
Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z) - Removing Adversarial Noise in Class Activation Feature Space [160.78488162713498]
We propose to remove adversarial noise by implementing a self-supervised adversarial training mechanism in a class activation feature space.
We train a denoising model to minimize the distances between the adversarial examples and the natural examples in the class activation feature space.
Empirical evaluations demonstrate that our method could significantly enhance adversarial robustness in comparison to previous state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-19T10:42:24Z) - Combating Adversaries with Anti-Adversaries [118.70141983415445]
In particular, our layer generates an input perturbation in the opposite direction of the adversarial one.
We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models.
Our anti-adversary layer significantly enhances model robustness while coming at no cost on clean accuracy.
arXiv Detail & Related papers (2021-03-26T09:36:59Z) - Attribute-Guided Adversarial Training for Robustness to Natural
Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space.
Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z) - Improved Detection of Adversarial Images Using Deep Neural Networks [2.3993545400014873]
Recent studies indicate that machine learning models used for classification tasks are vulnerable to adversarial examples.
We propose a new approach called Feature Map Denoising to detect the adversarial inputs.
We show the performance of detection on a mixed dataset consisting of adversarial examples.
arXiv Detail & Related papers (2020-07-10T19:02:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.