Related papers: Searching for the Essence of Adversarial Perturbations

Searching for the Essence of Adversarial Perturbations

URL: http://arxiv.org/abs/2205.15357v1
Date: Mon, 30 May 2022 18:04:57 GMT
Title: Searching for the Essence of Adversarial Perturbations
Authors: Dennis Y. Menn and Hung-yi Lee
Abstract summary: We show that adversarial perturbations contain human-recognizable information, which is the key conspirator responsible for a neural network's erroneous prediction. This concept of human-recognizable information allows us to explain key features related to adversarial perturbations.
Score: 73.96215665913797
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural networks have achieved the state-of-the-art performance on various machine learning fields, yet the incorporation of malicious perturbations with input data (adversarial example) is able to fool neural networks' predictions. This would lead to potential risks in real-world applications, for example, auto piloting and facial recognition. However, the reason for the existence of adversarial examples remains controversial. Here we demonstrate that adversarial perturbations contain human-recognizable information, which is the key conspirator responsible for a neural network's erroneous prediction. This concept of human-recognizable information allows us to explain key features related to adversarial perturbations, which include the existence of adversarial examples, the transferability among different neural networks, and the increased neural network interpretability for adversarial training. Two unique properties in adversarial perturbations that fool neural networks are uncovered: masking and generation. A special class, the complementary class, is identified when neural networks classify input images. The human-recognizable information contained in adversarial perturbations allows researchers to gain insight on the working principles of neural networks and may lead to develop techniques that detect/defense adversarial attacks.

Related papers

Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities. We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
Towards unlocking the mystery of adversarial fragility of neural networks [6.589200529058999]
We look at the smallest magnitude of possible additive perturbations that can change the output of a classification algorithm. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural network for classification.
arXiv Detail & Related papers (2024-06-23T19:37:13Z)
A Survey on Transferability of Adversarial Examples across Deep Neural Networks [53.04734042366312]
adversarial examples can manipulate machine learning models into making erroneous predictions. The transferability of adversarial examples enables black-box attacks which circumvent the need for detailed knowledge of the target model. This survey explores the landscape of the adversarial transferability of adversarial examples.
arXiv Detail & Related papers (2023-10-26T17:45:26Z)
Investigating Human-Identifiable Features Hidden in Adversarial Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets. We identify human-identifiable features in adversarial perturbations. Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z)
A reading survey on adversarial machine learning: Adversarial attacks and their understanding [6.1678491628787455]
Adversarial Machine Learning exploits and understands some of the vulnerabilities that cause the neural networks to misclassify for near original input. A class of algorithms called adversarial attacks is proposed to make the neural networks misclassify for various tasks in different domains. This article provides a survey of existing adversarial attacks and their understanding based on different perspectives.
arXiv Detail & Related papers (2023-08-07T07:37:26Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Explainable Adversarial Attacks in Deep Neural Networks Using Activation Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples. We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z)
Vulnerability Under Adversarial Machine Learning: Bias or Variance? [77.30759061082085]
We investigate the effect of adversarial machine learning on the bias and variance of a trained deep neural network. Our analysis sheds light on why the deep neural networks have poor performance under adversarial perturbation. We introduce a new adversarial machine learning algorithm with lower computational complexity than well-known adversarial machine learning strategies.
arXiv Detail & Related papers (2020-08-01T00:58:54Z)
Relationship between manifold smoothness and adversarial vulnerability in deep learning with local errors [2.7834038784275403]
We study the origin of the adversarial vulnerability in artificial neural networks. Our study reveals that a high generalization accuracy requires a relatively fast power-law decay of the eigen-spectrum of hidden representations.
arXiv Detail & Related papers (2020-07-04T08:47:51Z)
Bayesian Neural Networks [0.0]
We show how errors in prediction by neural networks can be obtained in principle, and provide the two favoured methods for characterising these errors. We will also describe how both of these methods have substantial pitfalls when put into practice.
arXiv Detail & Related papers (2020-06-02T09:43:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.