Interpreting Attributions and Interactions of Adversarial Attacks
- URL: http://arxiv.org/abs/2108.06895v1
- Date: Mon, 16 Aug 2021 04:59:39 GMT
- Title: Interpreting Attributions and Interactions of Adversarial Attacks
- Authors: Xin Wang, Shuyun Lin, Hao Zhang, Yufei Zhu, Quanshi Zhang
- Abstract summary: This paper aims to explain adversarial attacks in terms of how adversarial perturbations contribute to the attacking task.
We define and quantify interactions among adversarial perturbation pixels, and decompose the entire perturbation map into relatively independent perturbation components.
- Score: 19.50612458496236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper aims to explain adversarial attacks in terms of how adversarial
perturbations contribute to the attacking task. We estimate attributions of
different image regions to the decrease of the attacking cost based on the
Shapley value. We define and quantify interactions among adversarial
perturbation pixels, and decompose the entire perturbation map into relatively
independent perturbation components. The decomposition of the perturbation map
shows that adversarially-trained DNNs have more perturbation components in the
foreground than normally-trained DNNs. Moreover, compared to the
normally-trained DNN, the adversarially-trained DNN have more components which
mainly decrease the score of the true category. Above analyses provide new
insights into the understanding of adversarial attacks.
Related papers
- Attack Anything: Blind DNNs via Universal Background Adversarial Attack [17.73886733971713]
It has been widely substantiated that deep neural networks (DNNs) are susceptible and vulnerable to adversarial perturbations.
We propose a background adversarial attack framework to attack anything, by which the attack efficacy generalizes well between diverse objects, models, and tasks.
We conduct comprehensive and rigorous experiments in both digital and physical domains across various objects, models, and tasks, demonstrating the effectiveness of attacking anything of the proposed method.
arXiv Detail & Related papers (2024-08-17T12:46:53Z) - Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets.
We identify human-identifiable features in adversarial perturbations.
Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z) - Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training.
We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z) - A Unified Game-Theoretic Interpretation of Adversarial Robustness [39.64586231421121]
This paper provides a unified view to explain different adversarial attacks and defense methods.
Our findings provide a potential method to unify adversarial perturbations and robustness, which can explain the existing defense methods in a principle way.
arXiv Detail & Related papers (2021-11-05T14:57:49Z) - Adversarial Robustness through the Lens of Causality [105.51753064807014]
adversarial vulnerability of deep neural networks has attracted significant attention in machine learning.
We propose to incorporate causality into mitigating adversarial vulnerability.
Our method can be seen as the first attempt to leverage causality for mitigating adversarial vulnerability.
arXiv Detail & Related papers (2021-06-11T06:55:02Z) - Combating Adversaries with Anti-Adversaries [118.70141983415445]
In particular, our layer generates an input perturbation in the opposite direction of the adversarial one.
We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models.
Our anti-adversary layer significantly enhances model robustness while coming at no cost on clean accuracy.
arXiv Detail & Related papers (2021-03-26T09:36:59Z) - Game-theoretic Understanding of Adversarially Learned Features [38.19291233245746]
This paper aims to understand adversarial attacks and defense from a new perspecitve, i.e., the signal-processing behavior of DNNs.
We novelly define the multi-order interaction in game theory, which satisfies six properties.
arXiv Detail & Related papers (2021-03-12T15:56:28Z) - Interpreting Deep Neural Networks with Relative Sectional Propagation by
Analyzing Comparative Gradients and Hostile Activations [37.11665902583138]
We propose a new attribution method, Relative Sectional Propagation (RSP), for decomposing the output predictions of Deep Neural Networks (DNNs)
We define hostile factor as an element that interferes with finding the attributions of the target and propagates it in a distinguishable way to overcome the non-suppressed nature of activated neurons.
Our method makes it possible to decompose the predictions of DNNs with clearer class-discriminativeness and detailed elucidations of activation neurons compared to the conventional attribution methods.
arXiv Detail & Related papers (2020-12-07T03:11:07Z) - Improving adversarial robustness of deep neural networks by using
semantic information [17.887586209038968]
Adrial training is the main method for improving adversarial robustness and the first line of defense against adversarial attacks.
This paper provides a new perspective on the issue of adversarial robustness, one that shifts the focus from the network as a whole to the critical part of the region close to the decision boundary corresponding to a given class.
Experimental results on the MNIST and CIFAR-10 datasets show that this approach greatly improves adversarial robustness even using a very small dataset from the training data.
arXiv Detail & Related papers (2020-08-18T10:23:57Z) - Proper Network Interpretability Helps Adversarial Robustness in
Classification [91.39031895064223]
We show that with a proper measurement of interpretation, it is difficult to prevent prediction-evasion adversarial attacks from causing interpretation discrepancy.
We develop an interpretability-aware defensive scheme built only on promoting robust interpretation.
We show that our defense achieves both robust classification and robust interpretation, outperforming state-of-the-art adversarial training methods against attacks of large perturbation.
arXiv Detail & Related papers (2020-06-26T01:31:31Z) - Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic
Segmentation [79.42338812621874]
Adversarial training is promising for improving robustness of deep neural networks towards adversarial perturbations.
We formulate a general adversarial training procedure that can perform decently on both adversarial and clean samples.
We propose a dynamic divide-and-conquer adversarial training (DDC-AT) strategy to enhance the defense effect.
arXiv Detail & Related papers (2020-03-14T05:06:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.