Related papers: Understanding and Diagnosing Vulnerability under Adversarial Attacks

Understanding and Diagnosing Vulnerability under Adversarial Attacks

URL: http://arxiv.org/abs/2007.08716v1
Date: Fri, 17 Jul 2020 01:56:28 GMT
Title: Understanding and Diagnosing Vulnerability under Adversarial Attacks
Authors: Haizhong Zheng, Ziqi Zhang, Honglak Lee, Atul Prakash
Abstract summary: Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks. We propose a novel interpretability method, InterpretGAN, to generate explanations for features used for classification in latent variables. We also design the first diagnostic method to quantify the vulnerability contributed by each layer.
Score: 62.661498155101654
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks. Currently, there is no clear insight into how slight perturbations cause such a large difference in classification results and how we can design a more robust model architecture. In this work, we propose a novel interpretability method, InterpretGAN, to generate explanations for features used for classification in latent variables. Interpreting the classification process of adversarial examples exposes how adversarial perturbations influence features layer by layer as well as which features are modified by perturbations. Moreover, we design the first diagnostic method to quantify the vulnerability contributed by each layer, which can be used to identify vulnerable parts of model architectures. The diagnostic results show that the layers introducing more information loss tend to be more vulnerable than other layers. Based on the findings, our evaluation results on MNIST and CIFAR10 datasets suggest that average pooling layers, with lower information loss, are more robust than max pooling layers for the network architectures studied in this paper.

Related papers

Layer by Layer: Uncovering Hidden Representations in Language Models [28.304269706993942]
We show that intermediate layers can encode even richer representations, often improving performance on a wide range of downstream tasks. Our framework highlights how each model layer balances information compression and signal preservation. These findings challenge the standard focus on final-layer embeddings and open new directions for model analysis and optimization.
arXiv Detail & Related papers (2025-02-04T05:03:42Z)
Characterization of topological structures in different neural network architectures [0.0]
We develop methods for analyzing representations from different architectures and check how one should use them to obtain valid results. We applied these methods for ResNet, VGG19, and ViT architectures and found substantial differences along with some similarities.
arXiv Detail & Related papers (2024-07-08T18:02:18Z)
A Systematic Evaluation of Node Embedding Robustness [77.29026280120277]
We assess the empirical robustness of node embedding models to random and adversarial poisoning attacks. We compare edge addition, deletion and rewiring strategies computed using network properties as well as node labels. We found that node classification suffers from higher performance degradation as opposed to network reconstruction.
arXiv Detail & Related papers (2022-09-16T17:20:23Z)
Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks. We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z)
Meta Adversarial Perturbations [66.43754467275967]
We show the existence of a meta adversarial perturbation (MAP) MAP causes natural images to be misclassified with high probability after being updated through only a one-step gradient ascent update. We show that these perturbations are not only image-agnostic, but also model-agnostic, as a single perturbation generalizes well across unseen data points and different neural network architectures.
arXiv Detail & Related papers (2021-11-19T16:01:45Z)
Identifying Layers Susceptible to Adversarial Attacks [3.1473798197405944]
Common neural network architectures are susceptible to attack by adversarial samples. We show that susceptibility to adversarial samples is associated with low-level feature extraction layers. This phenomenon could have two explanations: either, adversarial attacks yield outputs from early layers that are indistinguishable from features found in the attack classes, or adversarial attacks yield outputs from early layers that differ statistically from features for non-adversarial samples.
arXiv Detail & Related papers (2021-07-10T12:38:49Z)
Explainable Adversarial Attacks in Deep Neural Networks Using Activation Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples. We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z)
Examining the causal structures of deep neural networks using information theory [0.0]
Deep Neural Networks (DNNs) are often examined at the level of their response to input, such as analyzing the mutual information between nodes and data sets. DNNs can also be examined at the level of causation, exploring "what does what" within the layers of the network itself. Here, we introduce a suite of metrics based on information theory to quantify and track changes in the causal structure of DNNs during training.
arXiv Detail & Related papers (2020-10-26T19:53:16Z)
Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data. We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z)
Multi-Objective Variational Autoencoder: an Application for Smart Infrastructure Maintenance [1.2311105789643062]
We propose a multi-objective variational autoencoder (MVA) method for smart infrastructure damage detection and diagnosis in multi-way sensing data. Our method fuses data from multiple sensors in one ADNN at which informative features are being extracted and utilized for damage identification.
arXiv Detail & Related papers (2020-03-11T01:30:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.