Investigating and unmasking feature-level vulnerabilities of CNNs to adversarial perturbations
- URL: http://arxiv.org/abs/2405.20672v1
- Date: Fri, 31 May 2024 08:14:44 GMT
- Title: Investigating and unmasking feature-level vulnerabilities of CNNs to adversarial perturbations
- Authors: Davide Coppola, Hwee Kuan Lee,
- Abstract summary: This study explores the impact of adversarial perturbations on Convolutional Neural Networks (CNNs)
We introduce the Adversarial Intervention framework to study the vulnerability of a CNN to adversarial perturbations.
- Score: 3.4530027457862
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study explores the impact of adversarial perturbations on Convolutional Neural Networks (CNNs) with the aim of enhancing the understanding of their underlying mechanisms. Despite numerous defense methods proposed in the literature, there is still an incomplete understanding of this phenomenon. Instead of treating the entire model as vulnerable, we propose that specific feature maps learned during training contribute to the overall vulnerability. To investigate how the hidden representations learned by a CNN affect its vulnerability, we introduce the Adversarial Intervention framework. Experiments were conducted on models trained on three well-known computer vision datasets, subjecting them to attacks of different nature. Our focus centers on the effects that adversarial perturbations to a model's initial layer have on the overall behavior of the model. Empirical results revealed compelling insights: a) perturbing selected channel combinations in shallow layers causes significant disruptions; b) the channel combinations most responsible for the disruptions are common among different types of attacks; c) despite shared vulnerable combinations of channels, different attacks affect hidden representations with varying magnitudes; d) there exists a positive correlation between a kernel's magnitude and its vulnerability. In conclusion, this work introduces a novel framework to study the vulnerability of a CNN model to adversarial perturbations, revealing insights that contribute to a deeper understanding of the phenomenon. The identified properties pave the way for the development of efficient ad-hoc defense mechanisms in future applications.
Related papers
- The Anatomy of Adversarial Attacks: Concept-based XAI Dissection [1.2916188356754918]
We study the influence of AAs on the concepts learned by convolutional neural networks (CNNs) using XAI techniques.
AAs induce substantial alterations in the concept composition within the feature space, introducing new concepts or modifying existing ones.
Our findings pave the way for the development of more robust and interpretable deep learning models.
arXiv Detail & Related papers (2024-03-25T13:57:45Z) - A Survey on Transferability of Adversarial Examples across Deep Neural Networks [53.04734042366312]
adversarial examples can manipulate machine learning models into making erroneous predictions.
The transferability of adversarial examples enables black-box attacks which circumvent the need for detailed knowledge of the target model.
This survey explores the landscape of the adversarial transferability of adversarial examples.
arXiv Detail & Related papers (2023-10-26T17:45:26Z) - Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets.
We identify human-identifiable features in adversarial perturbations.
Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z) - Mitigating Adversarial Vulnerability through Causal Parameter Estimation
by Adversarial Double Machine Learning [33.18197518590706]
Adversarial examples derived from deliberately crafted perturbations on visual inputs can easily harm decision process of deep neural networks.
We introduce a causal approach called Adversarial Double Machine Learning (ADML) which allows us to quantify the degree of adversarial vulnerability for network predictions.
ADML can directly estimate causal parameter of adversarial perturbations per se and mitigate negative effects that can potentially damage robustness.
arXiv Detail & Related papers (2023-07-14T09:51:26Z) - ExploreADV: Towards exploratory attack for Neural Networks [0.33302293148249124]
ExploreADV is a general and flexible adversarial attack system that is capable of modeling regional and imperceptible attacks.
We show that our system offers users good flexibility to focus on sub-regions of inputs, explore imperceptible perturbations and understand the vulnerability of pixels/regions to adversarial attacks.
arXiv Detail & Related papers (2023-01-01T07:17:03Z) - Adversarial Robustness through the Lens of Causality [105.51753064807014]
adversarial vulnerability of deep neural networks has attracted significant attention in machine learning.
We propose to incorporate causality into mitigating adversarial vulnerability.
Our method can be seen as the first attempt to leverage causality for mitigating adversarial vulnerability.
arXiv Detail & Related papers (2021-06-11T06:55:02Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack
and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples.
We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples.
Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z) - Transferable Perturbations of Deep Feature Distributions [102.94094966908916]
This work presents a new adversarial attack based on the modeling and exploitation of class-wise and layer-wise deep feature distributions.
We achieve state-of-the-art targeted blackbox transfer-based attack results for undefended ImageNet models.
arXiv Detail & Related papers (2020-04-27T00:32:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.