The Anatomy of Adversarial Attacks: Concept-based XAI Dissection
- URL: http://arxiv.org/abs/2403.16782v1
- Date: Mon, 25 Mar 2024 13:57:45 GMT
- Title: The Anatomy of Adversarial Attacks: Concept-based XAI Dissection
- Authors: Georgii Mikriukov, Gesina Schwalbe, Franz Motzkus, Korinna Bade,
- Abstract summary: We study the influence of AAs on the concepts learned by convolutional neural networks (CNNs) using XAI techniques.
AAs induce substantial alterations in the concept composition within the feature space, introducing new concepts or modifying existing ones.
Our findings pave the way for the development of more robust and interpretable deep learning models.
- Score: 1.2916188356754918
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial attacks (AAs) pose a significant threat to the reliability and robustness of deep neural networks. While the impact of these attacks on model predictions has been extensively studied, their effect on the learned representations and concepts within these models remains largely unexplored. In this work, we perform an in-depth analysis of the influence of AAs on the concepts learned by convolutional neural networks (CNNs) using eXplainable artificial intelligence (XAI) techniques. Through an extensive set of experiments across various network architectures and targeted AA techniques, we unveil several key findings. First, AAs induce substantial alterations in the concept composition within the feature space, introducing new concepts or modifying existing ones. Second, the adversarial perturbation itself can be linearly decomposed into a set of latent vector components, with a subset of these being responsible for the attack's success. Notably, we discover that these components are target-specific, i.e., are similar for a given target class throughout different AA techniques and starting classes. Our findings provide valuable insights into the nature of AAs and their impact on learned representations, paving the way for the development of more robust and interpretable deep learning models, as well as effective defenses against adversarial threats.
Related papers
- Edge-Only Universal Adversarial Attacks in Distributed Learning [49.546479320670464]
In this work, we explore the feasibility of generating universal adversarial attacks when an attacker has access to the edge part of the model only.
Our approach shows that adversaries can induce effective mispredictions in the unknown cloud part by leveraging key features on the edge side.
Our results on ImageNet demonstrate strong attack transferability to the unknown cloud part.
arXiv Detail & Related papers (2024-11-15T11:06:24Z) - Investigating and unmasking feature-level vulnerabilities of CNNs to adversarial perturbations [3.4530027457862]
This study explores the impact of adversarial perturbations on Convolutional Neural Networks (CNNs)
We introduce the Adversarial Intervention framework to study the vulnerability of a CNN to adversarial perturbations.
arXiv Detail & Related papers (2024-05-31T08:14:44Z) - A Survey on Transferability of Adversarial Examples across Deep Neural Networks [53.04734042366312]
adversarial examples can manipulate machine learning models into making erroneous predictions.
The transferability of adversarial examples enables black-box attacks which circumvent the need for detailed knowledge of the target model.
This survey explores the landscape of the adversarial transferability of adversarial examples.
arXiv Detail & Related papers (2023-10-26T17:45:26Z) - Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets.
We identify human-identifiable features in adversarial perturbations.
Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z) - Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A
Contemporary Survey [114.17568992164303]
Adrial attacks and defenses in machine learning and deep neural network have been gaining significant attention.
This survey provides a comprehensive overview of the recent advancements in the field of adversarial attack and defense techniques.
New avenues of attack are also explored, including search-based, decision-based, drop-based, and physical-world attacks.
arXiv Detail & Related papers (2023-03-11T04:19:31Z) - Deviations in Representations Induced by Adversarial Attacks [0.0]
Research has shown that deep learning models are vulnerable to adversarial attacks.
This finding brought about a new direction in research, whereby algorithms were developed to attack and defend vulnerable networks.
We present a method for measuring and analyzing the deviations in representations induced by adversarial attacks.
arXiv Detail & Related papers (2022-11-07T17:40:08Z) - A Closer Look at Evaluating the Bit-Flip Attack Against Deep Neural
Networks [0.0]
Bit-Flip Attack (BFA) aims at drastically dropping the performance of a model by targeting its parameters stored in memory.
This work is the first to present the impact of the BFA against fully-connected architectures.
arXiv Detail & Related papers (2022-09-28T17:04:39Z) - The Space of Adversarial Strategies [6.295859509997257]
Adversarial examples, inputs designed to induce worst-case behavior in machine learning models, have been extensively studied over the past decade.
We propose a systematic approach to characterize worst-case (i.e., optimal) adversaries.
arXiv Detail & Related papers (2022-09-09T20:53:11Z) - On the Properties of Adversarially-Trained CNNs [4.769747792846005]
Adversarial Training has proved to be an effective training paradigm to enforce robustness against adversarial examples in modern neural network architectures.
We describe surprising properties of adversarially-trained models, shedding light on mechanisms through which robustness against adversarial attacks is implemented.
arXiv Detail & Related papers (2022-03-17T11:11:52Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems.
In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.