Adversarial Perturbations Are Not So Weird: Entanglement of Robust and
Non-Robust Features in Neural Network Classifiers
- URL: http://arxiv.org/abs/2102.05110v1
- Date: Tue, 9 Feb 2021 20:21:31 GMT
- Title: Adversarial Perturbations Are Not So Weird: Entanglement of Robust and
Non-Robust Features in Neural Network Classifiers
- Authors: Jacob M. Springer, Melanie Mitchell, Garrett T. Kenyon
- Abstract summary: We show that in a neural network trained in a standard way, non-robust features respond to small, "non-semantic" patterns.
adversarial examples can be formed via minimal perturbations to these small, entangled patterns.
- Score: 4.511923587827301
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks trained on visual data are well-known to be vulnerable to
often imperceptible adversarial perturbations. The reasons for this
vulnerability are still being debated in the literature. Recently Ilyas et al.
(2019) showed that this vulnerability arises, in part, because neural network
classifiers rely on highly predictive but brittle "non-robust" features. In
this paper we extend the work of Ilyas et al. by investigating the nature of
the input patterns that give rise to these features. In particular, we
hypothesize that in a neural network trained in a standard way, non-robust
features respond to small, "non-semantic" patterns that are typically entangled
with larger, robust patterns, known to be more human-interpretable, as opposed
to solely responding to statistical artifacts in a dataset. Thus, adversarial
examples can be formed via minimal perturbations to these small, entangled
patterns. In addition, we demonstrate a corollary of our hypothesis: robust
classifiers are more effective than standard (non-robust) ones as a source for
generating transferable adversarial examples in both the untargeted and
targeted settings. The results we present in this paper provide new insight
into the nature of the non-robust features responsible for adversarial
vulnerability of neural network classifiers.
Related papers
- On the Robustness of Neural Collapse and the Neural Collapse of Robustness [6.227447957721122]
Neural Collapse refers to the curious phenomenon in the end of training of a neural network, where feature vectors and classification weights converge to a very simple geometrical arrangement (a simplex)
We study the stability properties of these simplices, and find that the simplex structure disappears under small adversarial attacks.
We identify novel properties of both robust and non-robust machine learning models, and show that earlier, unlike later layers maintain reliable simplices on perturbed data.
arXiv Detail & Related papers (2023-11-13T16:18:58Z) - Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets.
We identify human-identifiable features in adversarial perturbations.
Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z) - On the Computational Entanglement of Distant Features in Adversarial Machine Learning [8.87656044562629]
We introduce the concept of "computational entanglement"
Computational entanglement enables the network to achieve zero loss by fitting random noise, even on previously unseen test samples.
We present a novel application of computational entanglement in transforming a worst-case adversarial examples-inputs that are highly non-robust.
arXiv Detail & Related papers (2023-09-27T14:09:15Z) - How adversarial attacks can disrupt seemingly stable accurate classifiers [76.95145661711514]
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data.
Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data.
We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability.
arXiv Detail & Related papers (2023-09-07T12:02:00Z) - Residual Error: a New Performance Measure for Adversarial Robustness [85.0371352689919]
A major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks.
This study presents the concept of residual error, a new performance measure for assessing the adversarial robustness of a deep neural network.
Experimental results using the case of image classification demonstrate the effectiveness and efficacy of the proposed residual error metric.
arXiv Detail & Related papers (2021-06-18T16:34:23Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - Non-Singular Adversarial Robustness of Neural Networks [58.731070632586594]
Adrial robustness has become an emerging challenge for neural network owing to its over-sensitivity to small input perturbations.
We formalize the notion of non-singular adversarial robustness for neural networks through the lens of joint perturbations to data inputs as well as model weights.
arXiv Detail & Related papers (2021-02-23T20:59:30Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z) - Relationship between manifold smoothness and adversarial vulnerability
in deep learning with local errors [2.7834038784275403]
We study the origin of the adversarial vulnerability in artificial neural networks.
Our study reveals that a high generalization accuracy requires a relatively fast power-law decay of the eigen-spectrum of hidden representations.
arXiv Detail & Related papers (2020-07-04T08:47:51Z) - Bayesian Neural Networks [0.0]
We show how errors in prediction by neural networks can be obtained in principle, and provide the two favoured methods for characterising these errors.
We will also describe how both of these methods have substantial pitfalls when put into practice.
arXiv Detail & Related papers (2020-06-02T09:43:00Z) - Metrics and methods for robustness evaluation of neural networks with
generative models [0.07366405857677225]
Recently, especially in computer vision, researchers discovered "natural" or "semantic" perturbations, such as rotations, changes of brightness, or more high-level changes.
We propose several metrics to measure robustness of classifiers to natural adversarial examples, and methods to evaluate them.
arXiv Detail & Related papers (2020-03-04T10:58:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.