Related papers: NoiseCAM: Explainable AI for the Boundary Between Noise and Adversarial Attacks

NoiseCAM: Explainable AI for the Boundary Between Noise and Adversarial Attacks

URL: http://arxiv.org/abs/2303.06151v1
Date: Thu, 9 Mar 2023 22:07:41 GMT
Title: NoiseCAM: Explainable AI for the Boundary Between Noise and Adversarial Attacks
Authors: Wenkai Tan, Justus Renkhoff, Alvaro Velasquez, Ziyu Wang, Lusi Li, Jian Wang, Shuteng Niu, Fan Yang, Yongxin Liu, Houbing Song
Abstract summary: adversarial attacks can easily mislead a neural network and lead to wrong decisions. In this paper, we use the gradient class activation map (GradCAM) to analyze the behavior deviation of the VGG-16 network. We also propose a novel NoiseCAM algorithm that integrates information from globally and pixel-level weighted class activation maps.
Score: 21.86821880164293
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Deep Learning (DL) and Deep Neural Networks (DNNs) are widely used in various domains. However, adversarial attacks can easily mislead a neural network and lead to wrong decisions. Defense mechanisms are highly preferred in safety-critical applications. In this paper, firstly, we use the gradient class activation map (GradCAM) to analyze the behavior deviation of the VGG-16 network when its inputs are mixed with adversarial perturbation or Gaussian noise. In particular, our method can locate vulnerable layers that are sensitive to adversarial perturbation and Gaussian noise. We also show that the behavior deviation of vulnerable layers can be used to detect adversarial examples. Secondly, we propose a novel NoiseCAM algorithm that integrates information from globally and pixel-level weighted class activation maps. Our algorithm is susceptible to adversarial perturbations and will not respond to Gaussian random noise mixed in the inputs. Third, we compare detecting adversarial examples using both behavior deviation and NoiseCAM, and we show that NoiseCAM outperforms behavior deviation modeling in its overall performance. Our work could provide a useful tool to defend against certain adversarial attacks on deep neural networks.

Related papers

Discerning the Chaos: Detecting Adversarial Perturbations while Disentangling Intentional from Unintentional Noises [41.57633238074266]
This paper introduces CIAI, a Class-Independent Adversarial Intent detection network built on a modified vision transformer with detection layers. CIAI employs a novel loss function that combines Maximum Mean Discrepancy and Center Loss to detect both intentional (adversarial attacks) and unintentional noise.
arXiv Detail & Related papers (2024-09-29T09:10:43Z)
Exploring Adversarial Attacks on Neural Networks: An Explainable Approach [18.063187159491182]
We analyze the response characteristics of the VGG-16 model when the input images are mixed with adversarial noise and statistically similar Gaussian random noise. Our work could provide valuable insights into developing more reliable Deep Neural Network (DNN) models.
arXiv Detail & Related papers (2023-03-08T07:59:44Z)
Beyond Pretrained Features: Noisy Image Modeling Provides Adversarial Defense [52.66971714830943]
Masked image modeling (MIM) has made it a prevailing framework for self-supervised visual representation learning. In this paper, we investigate how this powerful self-supervised learning paradigm can provide adversarial robustness to downstream classifiers. We propose an adversarial defense method, referred to as De3, by exploiting the pretrained decoder for denoising.
arXiv Detail & Related papers (2023-02-02T12:37:24Z)
Detecting Adversaries, yet Faltering to Noise? Leveraging Conditional Variational AutoEncoders for Adversary Detection in the Presence of Noisy Images [0.7734726150561086]
Conditional Variational AutoEncoders (CVAE) are surprisingly good at detecting imperceptible image perturbations. We show how CVAEs can be effectively used to detect adversarial attacks on image classification networks.
arXiv Detail & Related papers (2021-11-28T20:36:27Z)
On Procedural Adversarial Noise Attack And Defense [2.5388455804357952]
adversarial examples would inveigle neural networks to make prediction errors with small per- turbations on the input images. In this paper, we propose two universal adversarial perturbation (UAP) generation methods based on procedural noise functions. Without changing the semantic representations, the adversarial examples generated via our methods show superior performance on the attack.
arXiv Detail & Related papers (2021-08-10T02:47:01Z)
Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation. ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations. The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z)
DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA) Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution. Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z)
Removing Adversarial Noise in Class Activation Feature Space [160.78488162713498]
We propose to remove adversarial noise by implementing a self-supervised adversarial training mechanism in a class activation feature space. We train a denoising model to minimize the distances between the adversarial examples and the natural examples in the class activation feature space. Empirical evaluations demonstrate that our method could significantly enhance adversarial robustness in comparison to previous state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-19T10:42:24Z)
Combating Adversaries with Anti-Adversaries [118.70141983415445]
In particular, our layer generates an input perturbation in the opposite direction of the adversarial one. We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models. Our anti-adversary layer significantly enhances model robustness while coming at no cost on clean accuracy.
arXiv Detail & Related papers (2021-03-26T09:36:59Z)
Attribute-Guided Adversarial Training for Robustness to Natural Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space. Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z)
Improved Detection of Adversarial Images Using Deep Neural Networks [2.3993545400014873]
Recent studies indicate that machine learning models used for classification tasks are vulnerable to adversarial examples. We propose a new approach called Feature Map Denoising to detect the adversarial inputs. We show the performance of detection on a mixed dataset consisting of adversarial examples.
arXiv Detail & Related papers (2020-07-10T19:02:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.