Joint Universal Adversarial Perturbations with Interpretations
- URL: http://arxiv.org/abs/2408.01715v1
- Date: Sat, 3 Aug 2024 08:58:04 GMT
- Title: Joint Universal Adversarial Perturbations with Interpretations
- Authors: Liang-bo Ning, Zeyu Dai, Wenqi Fan, Jingran Su, Chao Pan, Luning Wang, Qing Li,
- Abstract summary: In this paper, we propose a novel attacking framework to generate joint universal adversarial perturbations (JUAP)
To the best of our knowledge, this is the first effort to study UAP for jointly attacking both DNNs and interpretations.
- Score: 19.140429650679593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) have significantly boosted the performance of many challenging tasks. Despite the great development, DNNs have also exposed their vulnerability. Recent studies have shown that adversaries can manipulate the predictions of DNNs by adding a universal adversarial perturbation (UAP) to benign samples. On the other hand, increasing efforts have been made to help users understand and explain the inner working of DNNs by highlighting the most informative parts (i.e., attribution maps) of samples with respect to their predictions. Moreover, we first empirically find that such attribution maps between benign and adversarial examples have a significant discrepancy, which has the potential to detect universal adversarial perturbations for defending against adversarial attacks. This finding motivates us to further investigate a new research problem: whether there exist universal adversarial perturbations that are able to jointly attack DNNs classifier and its interpretation with malicious desires. It is challenging to give an explicit answer since these two objectives are seemingly conflicting. In this paper, we propose a novel attacking framework to generate joint universal adversarial perturbations (JUAP), which can fool the DNNs model and misguide the inspection from interpreters simultaneously. Comprehensive experiments on various datasets demonstrate the effectiveness of the proposed method JUAP for joint attacks. To the best of our knowledge, this is the first effort to study UAP for jointly attacking both DNNs and interpretations.
Related papers
- Detecting Adversarial Examples [24.585379549997743]
We propose a novel method to detect adversarial examples by analyzing the layer outputs of Deep Neural Networks.
Our method is highly effective, compatible with any DNN architecture, and applicable across different domains, such as image, video, and audio.
arXiv Detail & Related papers (2024-10-22T21:42:59Z) - Relationship between Uncertainty in DNNs and Adversarial Attacks [0.0]
Deep Neural Networks (DNNs) have achieved state of the art results and even outperformed human accuracy in many challenging tasks.
DNNs are accompanied by uncertainty about their results, causing them to predict an outcome that is either incorrect or outside of a certain level of confidence.
arXiv Detail & Related papers (2024-09-20T05:38:38Z) - Not So Robust After All: Evaluating the Robustness of Deep Neural
Networks to Unseen Adversarial Attacks [5.024667090792856]
Deep neural networks (DNNs) have gained prominence in various applications, such as classification, recognition, and prediction.
A fundamental attribute of traditional DNNs is their vulnerability to modifications in input data, which has resulted in the investigation of adversarial attacks.
This study aims to challenge the efficacy and generalization of contemporary defense mechanisms against adversarial attacks.
arXiv Detail & Related papers (2023-08-12T05:21:34Z) - Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training.
We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z) - A Comprehensive Survey on Trustworthy Graph Neural Networks: Privacy,
Robustness, Fairness, and Explainability [59.80140875337769]
Graph Neural Networks (GNNs) have made rapid developments in the recent years.
GNNs can leak private information, are vulnerable to adversarial attacks, can inherit and magnify societal bias from training data.
This paper gives a comprehensive survey of GNNs in the computational aspects of privacy, robustness, fairness, and explainability.
arXiv Detail & Related papers (2022-04-18T21:41:07Z) - A Unified Game-Theoretic Interpretation of Adversarial Robustness [39.64586231421121]
This paper provides a unified view to explain different adversarial attacks and defense methods.
Our findings provide a potential method to unify adversarial perturbations and robustness, which can explain the existing defense methods in a principle way.
arXiv Detail & Related papers (2021-11-05T14:57:49Z) - Jointly Attacking Graph Neural Network and its Explanations [50.231829335996814]
Graph Neural Networks (GNNs) have boosted the performance for many graph-related tasks.
Recent studies have shown that GNNs are highly vulnerable to adversarial attacks, where adversaries can mislead the GNNs' prediction by modifying graphs.
We propose a novel attack framework (GEAttack) which can attack both a GNN model and its explanations by simultaneously exploiting their vulnerabilities.
arXiv Detail & Related papers (2021-08-07T07:44:33Z) - A Survey On Universal Adversarial Attack [68.1815935074054]
Deep neural networks (DNNs) have demonstrated remarkable performance for various applications.
They are widely known to be vulnerable to the attack of adversarial perturbations.
Universal adversarial perturbations (UAPs) fool the target DNN for most images.
arXiv Detail & Related papers (2021-03-02T06:35:09Z) - Recent Advances in Understanding Adversarial Robustness of Deep Neural
Networks [15.217367754000913]
It is increasingly important to obtain models with high robustness that are resistant to adversarial examples.
We give preliminary definitions on what adversarial attacks and robustness are.
We study frequently-used benchmarks and mention theoretically-proved bounds for adversarial robustness.
arXiv Detail & Related papers (2020-11-03T07:42:53Z) - Double Targeted Universal Adversarial Perturbations [83.60161052867534]
We introduce a double targeted universal adversarial perturbations (DT-UAPs) to bridge the gap between the instance-discriminative image-dependent perturbations and the generic universal perturbations.
We show the effectiveness of the proposed DTA algorithm on a wide range of datasets and also demonstrate its potential as a physical attack.
arXiv Detail & Related papers (2020-10-07T09:08:51Z) - Adversarial Attacks and Defenses on Graphs: A Review, A Tool and
Empirical Studies [73.39668293190019]
Adversary attacks can be easily fooled by small perturbation on the input.
Graph Neural Networks (GNNs) have been demonstrated to inherit this vulnerability.
In this survey, we categorize existing attacks and defenses, and review the corresponding state-of-the-art methods.
arXiv Detail & Related papers (2020-03-02T04:32:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.