Related papers: Analyzing the Noise Robustness of Deep Neural Networks

Analyzing the Noise Robustness of Deep Neural Networks

URL: http://arxiv.org/abs/2001.09395v1
Date: Sun, 26 Jan 2020 03:39:10 GMT
Title: Analyzing the Noise Robustness of Deep Neural Networks
Authors: Kelei Cao, Mengchen Liu, Hang Su, Jing Wu, Jun Zhu, Shixia Liu
Abstract summary: Adversarial examples, generated by adding small but intentionally imperceptible perturbations to normal examples, can mislead deep neural networks (DNNs) to make incorrect predictions. We present a visual analysis method to explain why adversarial examples are misclassified.
Score: 43.63911131982369
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adversarial examples, generated by adding small but intentionally imperceptible perturbations to normal examples, can mislead deep neural networks (DNNs) to make incorrect predictions. Although much work has been done on both adversarial attack and defense, a fine-grained understanding of adversarial examples is still lacking. To address this issue, we present a visual analysis method to explain why adversarial examples are misclassified. The key is to compare and analyze the datapaths of both the adversarial and normal examples. A datapath is a group of critical neurons along with their connections. We formulate the datapath extraction as a subset selection problem and solve it by constructing and training a neural network. A multi-level visualization consisting of a network-level visualization of data flows, a layer-level visualization of feature maps, and a neuron-level visualization of learned features, has been designed to help investigate how datapaths of adversarial and normal examples diverge and merge in the prediction process. A quantitative evaluation and a case study were conducted to demonstrate the promise of our method to explain the misclassification of adversarial examples.

Related papers

Obtaining Example-Based Explanations from Deep Neural Networks [18.708235771482205]
EBE-DNN can provide highly concentrated example attributions, i.e., the predictions can be explained with few training examples. The choice of layer to use for the embeddings may have a large impact on the resulting accuracy.
arXiv Detail & Related papers (2025-02-27T05:10:48Z)
Hidden Activations Are Not Enough: A General Approach to Neural Network Predictions [0.0]
We introduce a novel mathematical framework for analyzing neural networks using tools from quiver representation theory. By leveraging the induced quiver representation of a data sample, we capture more information than traditional hidden layer outputs. Results are architecture-agnostic and task-agnostic, making them broadly applicable.
arXiv Detail & Related papers (2024-09-20T02:35:13Z)
On Discprecncies between Perturbation Evaluations of Graph Neural Network Attributions [49.8110352174327]
We assess attribution methods from a perspective not previously explored in the graph domain: retraining. The core idea is to retrain the network on important (or not important) relationships as identified by the attributions. We run our analysis on four state-of-the-art GNN attribution methods and five synthetic and real-world graph classification datasets.
arXiv Detail & Related papers (2024-01-01T02:03:35Z)
Slope and generalization properties of neural networks [0.0]
We show that the distribution of the slope of a well-trained neural network classifier is generally independent of the width of the layers in a fully connected network. The slope is of similar size throughout the relevant volume, and varies smoothly. It also behaves as predicted in rescaling examples. We discuss possible applications of the slope concept, such as using it as a part of the loss function or stopping criterion during network training, or ranking data sets in terms of their complexity.
arXiv Detail & Related papers (2021-07-03T17:54:27Z)
Explainable Adversarial Attacks in Deep Neural Networks Using Activation Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples. We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z)
Anomaly Detection on Attributed Networks via Contrastive Self-Supervised Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks. Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair. A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z)
On the Transferability of Adversarial Attacksagainst Neural Text Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models. We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models. We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
Explaining Away Attacks Against Neural Networks [3.658164271285286]
We investigate the problem of identifying adversarial attacks on image-based neural networks. We present intriguing experimental results showing significant discrepancies between the explanations generated for the predictions of a model on clean and adversarial data. We propose a framework which can identify whether a given input is adversarial based on the explanations given by the model.
arXiv Detail & Related papers (2020-03-06T15:32:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.