The Dimpled Manifold Model of Adversarial Examples in Machine Learning
- URL: http://arxiv.org/abs/2106.10151v1
- Date: Fri, 18 Jun 2021 14:32:55 GMT
- Title: The Dimpled Manifold Model of Adversarial Examples in Machine Learning
- Authors: Adi Shamir, Odelia Melamed, Oriel BenShmuel
- Abstract summary: In this paper we introduce a new conceptual framework which provides a simple explanation for why adversarial examples exist.
In the last part of the paper we describe the results of numerous experiments which strongly support this new model.
- Score: 6.6690527698171165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The extreme fragility of deep neural networks when presented with tiny
perturbations in their inputs was independently discovered by several research
groups in 2013, but in spite of enormous effort these adversarial examples
remained a baffling phenomenon with no clear explanation. In this paper we
introduce a new conceptual framework (which we call the Dimpled Manifold Model)
which provides a simple explanation for why adversarial examples exist, why
their perturbations have such tiny norms, why these perturbations look like
random noise, and why a network which was adversarially trained with
incorrectly labeled images can still correctly classify test images. In the
last part of the paper we describe the results of numerous experiments which
strongly support this new model, and in particular our assertion that
adversarial perturbations are roughly perpendicular to the low dimensional
manifold which contains all the training examples.
Related papers
- Transcending Adversarial Perturbations: Manifold-Aided Adversarial
Examples with Legitimate Semantics [10.058463432437659]
Deep neural networks were significantly vulnerable to adversarial examples manipulated by malicious tiny perturbations.
In this paper, we propose a supervised semantic-transformation generative model to generate adversarial examples with real and legitimate semantics.
Experiments on MNIST and industrial defect datasets showed that our adversarial examples not only exhibited better visual quality but also achieved superior attack transferability.
arXiv Detail & Related papers (2024-02-05T15:25:40Z) - Does Saliency-Based Training bring Robustness for Deep Neural Networks
in Image Classification? [0.0]
Black-box nature of Deep Neural Networks impedes a complete understanding of their inner workings.
Online saliency-guided training methods try to highlight the prominent features in the model's output to alleviate this problem.
We quantify the robustness and conclude that despite the well-explained visualizations in the model's output, the salient models suffer from the lower performance against adversarial examples attacks.
arXiv Detail & Related papers (2023-06-28T22:20:19Z) - A Frequency Perspective of Adversarial Robustness [72.48178241090149]
We present a frequency-based understanding of adversarial examples, supported by theoretical and empirical findings.
Our analysis shows that adversarial examples are neither in high-frequency nor in low-frequency components, but are simply dataset dependent.
We propose a frequency-based explanation for the commonly observed accuracy vs. robustness trade-off.
arXiv Detail & Related papers (2021-10-26T19:12:34Z) - Unsupervised Detection of Adversarial Examples with Model Explanations [0.6091702876917279]
We propose a simple yet effective method to detect adversarial examples using methods developed to explain the model's behavior.
Our evaluations with MNIST handwritten dataset show that our method is capable of detecting adversarial examples with high confidence.
arXiv Detail & Related papers (2021-07-22T06:54:18Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - Detecting Adversarial Examples by Input Transformations, Defense
Perturbations, and Voting [71.57324258813674]
convolutional neural networks (CNNs) have proved to reach super-human performance in visual recognition tasks.
CNNs can easily be fooled by adversarial examples, i.e., maliciously-crafted images that force the networks to predict an incorrect output.
This paper extensively explores the detection of adversarial examples via image transformations and proposes a novel methodology.
arXiv Detail & Related papers (2021-01-27T14:50:41Z) - Exploring Simple Siamese Representation Learning [68.37628268182185]
We show that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders.
Our experiments show that collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing.
arXiv Detail & Related papers (2020-11-20T18:59:33Z) - On the Transferability of Adversarial Attacksagainst Neural Text
Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models.
We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models.
We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z) - Verifying the Causes of Adversarial Examples [5.381050729919025]
The robustness of neural networks is challenged by adversarial examples that contain almost imperceptible perturbations to inputs.
We present a collection of potential causes of adversarial examples and verify (or partially verify) them through carefully-designed controlled experiments.
Our experiment results show that geometric factors tend to be more direct causes and statistical factors magnify the phenomenon.
arXiv Detail & Related papers (2020-10-19T16:17:20Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.