Understanding and Diagnosing Vulnerability under Adversarial Attacks
- URL: http://arxiv.org/abs/2007.08716v1
- Date: Fri, 17 Jul 2020 01:56:28 GMT
- Title: Understanding and Diagnosing Vulnerability under Adversarial Attacks
- Authors: Haizhong Zheng, Ziqi Zhang, Honglak Lee, Atul Prakash
- Abstract summary: Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks.
We propose a novel interpretability method, InterpretGAN, to generate explanations for features used for classification in latent variables.
We also design the first diagnostic method to quantify the vulnerability contributed by each layer.
- Score: 62.661498155101654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Networks (DNNs) are known to be vulnerable to adversarial
attacks. Currently, there is no clear insight into how slight perturbations
cause such a large difference in classification results and how we can design a
more robust model architecture. In this work, we propose a novel
interpretability method, InterpretGAN, to generate explanations for features
used for classification in latent variables. Interpreting the classification
process of adversarial examples exposes how adversarial perturbations influence
features layer by layer as well as which features are modified by
perturbations. Moreover, we design the first diagnostic method to quantify the
vulnerability contributed by each layer, which can be used to identify
vulnerable parts of model architectures. The diagnostic results show that the
layers introducing more information loss tend to be more vulnerable than other
layers. Based on the findings, our evaluation results on MNIST and CIFAR10
datasets suggest that average pooling layers, with lower information loss, are
more robust than max pooling layers for the network architectures studied in
this paper.
Related papers
- Characterization of topological structures in different neural network architectures [0.0]
We develop methods for analyzing representations from different architectures and check how one should use them to obtain valid results.
We applied these methods for ResNet, VGG19, and ViT architectures and found substantial differences along with some similarities.
arXiv Detail & Related papers (2024-07-08T18:02:18Z) - A Systematic Evaluation of Node Embedding Robustness [77.29026280120277]
We assess the empirical robustness of node embedding models to random and adversarial poisoning attacks.
We compare edge addition, deletion and rewiring strategies computed using network properties as well as node labels.
We found that node classification suffers from higher performance degradation as opposed to network reconstruction.
arXiv Detail & Related papers (2022-09-16T17:20:23Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - Meta Adversarial Perturbations [66.43754467275967]
We show the existence of a meta adversarial perturbation (MAP)
MAP causes natural images to be misclassified with high probability after being updated through only a one-step gradient ascent update.
We show that these perturbations are not only image-agnostic, but also model-agnostic, as a single perturbation generalizes well across unseen data points and different neural network architectures.
arXiv Detail & Related papers (2021-11-19T16:01:45Z) - Identifying Layers Susceptible to Adversarial Attacks [3.1473798197405944]
Common neural network architectures are susceptible to attack by adversarial samples.
We show that susceptibility to adversarial samples is associated with low-level feature extraction layers.
This phenomenon could have two explanations: either, adversarial attacks yield outputs from early layers that are indistinguishable from features found in the attack classes, or adversarial attacks yield outputs from early layers that differ statistically from features for non-adversarial samples.
arXiv Detail & Related papers (2021-07-10T12:38:49Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - Examining the causal structures of deep neural networks using
information theory [0.0]
Deep Neural Networks (DNNs) are often examined at the level of their response to input, such as analyzing the mutual information between nodes and data sets.
DNNs can also be examined at the level of causation, exploring "what does what" within the layers of the network itself.
Here, we introduce a suite of metrics based on information theory to quantify and track changes in the causal structure of DNNs during training.
arXiv Detail & Related papers (2020-10-26T19:53:16Z) - Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data.
We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z) - Multi-Objective Variational Autoencoder: an Application for Smart
Infrastructure Maintenance [1.2311105789643062]
We propose a multi-objective variational autoencoder (MVA) method for smart infrastructure damage detection and diagnosis in multi-way sensing data.
Our method fuses data from multiple sensors in one ADNN at which informative features are being extracted and utilized for damage identification.
arXiv Detail & Related papers (2020-03-11T01:30:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.