Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations
- URL: http://arxiv.org/abs/2309.16878v1
- Date: Thu, 28 Sep 2023 22:31:29 GMT
- Title: Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations
- Authors: Dennis Y. Menn, Tzu-hsun Feng, Sriram Vishwanath, Hung-yi Lee
- Abstract summary: Our study explores up to five attack algorithms across three datasets.
We identify human-identifiable features in adversarial perturbations.
Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
- Score: 54.39726653562144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks perform exceedingly well across various machine learning
tasks but are not immune to adversarial perturbations. This vulnerability has
implications for real-world applications. While much research has been
conducted, the underlying reasons why neural networks fall prey to adversarial
attacks are not yet fully understood. Central to our study, which explores up
to five attack algorithms across three datasets, is the identification of
human-identifiable features in adversarial perturbations. Additionally, we
uncover two distinct effects manifesting within human-identifiable features.
Specifically, the masking effect is prominent in untargeted attacks, while the
generation effect is more common in targeted attacks. Using pixel-level
annotations, we extract such features and demonstrate their ability to
compromise target models. In addition, our findings indicate a notable extent
of similarity in perturbations across different attack algorithms when averaged
over multiple models. This work also provides insights into phenomena
associated with adversarial perturbations, such as transferability and model
interpretability. Our study contributes to a deeper understanding of the
underlying mechanisms behind adversarial attacks and offers insights for the
development of more resilient defense strategies for neural networks.
Related papers
- Detecting Adversarial Attacks in Semantic Segmentation via Uncertainty Estimation: A Deep Analysis [12.133306321357999]
We propose an uncertainty-based method for detecting adversarial attacks on neural networks for semantic segmentation.
We conduct a detailed analysis of uncertainty-based detection of adversarial attacks and various state-of-the-art neural networks.
Our numerical experiments show the effectiveness of the proposed uncertainty-based detection method.
arXiv Detail & Related papers (2024-08-19T14:13:30Z) - Investigating and unmasking feature-level vulnerabilities of CNNs to adversarial perturbations [3.4530027457862]
This study explores the impact of adversarial perturbations on Convolutional Neural Networks (CNNs)
We introduce the Adversarial Intervention framework to study the vulnerability of a CNN to adversarial perturbations.
arXiv Detail & Related papers (2024-05-31T08:14:44Z) - A Survey on Transferability of Adversarial Examples across Deep Neural Networks [53.04734042366312]
adversarial examples can manipulate machine learning models into making erroneous predictions.
The transferability of adversarial examples enables black-box attacks which circumvent the need for detailed knowledge of the target model.
This survey explores the landscape of the adversarial transferability of adversarial examples.
arXiv Detail & Related papers (2023-10-26T17:45:26Z) - A reading survey on adversarial machine learning: Adversarial attacks
and their understanding [6.1678491628787455]
Adversarial Machine Learning exploits and understands some of the vulnerabilities that cause the neural networks to misclassify for near original input.
A class of algorithms called adversarial attacks is proposed to make the neural networks misclassify for various tasks in different domains.
This article provides a survey of existing adversarial attacks and their understanding based on different perspectives.
arXiv Detail & Related papers (2023-08-07T07:37:26Z) - Searching for the Essence of Adversarial Perturbations [73.96215665913797]
We show that adversarial perturbations contain human-recognizable information, which is the key conspirator responsible for a neural network's erroneous prediction.
This concept of human-recognizable information allows us to explain key features related to adversarial perturbations.
arXiv Detail & Related papers (2022-05-30T18:04:57Z) - Identification of Attack-Specific Signatures in Adversarial Examples [62.17639067715379]
We show that different attack algorithms produce adversarial examples which are distinct not only in their effectiveness but also in how they qualitatively affect their victims.
Our findings suggest that prospective adversarial attacks should be compared not only via their success rates at fooling models but also via deeper downstream effects they have on victims.
arXiv Detail & Related papers (2021-10-13T15:40:48Z) - Attack to Fool and Explain Deep Networks [59.97135687719244]
We counter-argue by providing evidence of human-meaningful patterns in adversarial perturbations.
Our major contribution is a novel pragmatic adversarial attack that is subsequently transformed into a tool to interpret the visual models.
arXiv Detail & Related papers (2021-06-20T03:07:36Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - Relationship between manifold smoothness and adversarial vulnerability
in deep learning with local errors [2.7834038784275403]
We study the origin of the adversarial vulnerability in artificial neural networks.
Our study reveals that a high generalization accuracy requires a relatively fast power-law decay of the eigen-spectrum of hidden representations.
arXiv Detail & Related papers (2020-07-04T08:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.