A Unified Game-Theoretic Interpretation of Adversarial Robustness
- URL: http://arxiv.org/abs/2111.03536v2
- Date: Mon, 8 Nov 2021 05:26:14 GMT
- Title: A Unified Game-Theoretic Interpretation of Adversarial Robustness
- Authors: Jie Ren, Die Zhang, Yisen Wang, Lu Chen, Zhanpeng Zhou, Yiting Chen,
Xu Cheng, Xin Wang, Meng Zhou, Jie Shi, Quanshi Zhang
- Abstract summary: This paper provides a unified view to explain different adversarial attacks and defense methods.
Our findings provide a potential method to unify adversarial perturbations and robustness, which can explain the existing defense methods in a principle way.
- Score: 39.64586231421121
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper provides a unified view to explain different adversarial attacks
and defense methods, \emph{i.e.} the view of multi-order interactions between
input variables of DNNs. Based on the multi-order interaction, we discover that
adversarial attacks mainly affect high-order interactions to fool the DNN.
Furthermore, we find that the robustness of adversarially trained DNNs comes
from category-specific low-order interactions. Our findings provide a potential
method to unify adversarial perturbations and robustness, which can explain the
existing defense methods in a principle way. Besides, our findings also make a
revision of previous inaccurate understanding of the shape bias of
adversarially learned features.
Related papers
- Joint Universal Adversarial Perturbations with Interpretations [19.140429650679593]
In this paper, we propose a novel attacking framework to generate joint universal adversarial perturbations (JUAP)
To the best of our knowledge, this is the first effort to study UAP for jointly attacking both DNNs and interpretations.
arXiv Detail & Related papers (2024-08-03T08:58:04Z) - Improved and Interpretable Defense to Transferred Adversarial Examples
by Jacobian Norm with Selective Input Gradient Regularization [31.516568778193157]
Adversarial training (AT) is often adopted to improve the robustness of deep neural networks (DNNs)
In this work, we propose an approach based on Jacobian norm and Selective Input Gradient Regularization (J- SIGR)
Experiments demonstrate that the proposed J- SIGR confers improved robustness against transferred adversarial attacks, and we also show that the predictions from the neural network are easy to interpret.
arXiv Detail & Related papers (2022-07-09T01:06:41Z) - Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training.
We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z) - Searching for an Effective Defender: Benchmarking Defense against
Adversarial Word Substitution [83.84968082791444]
Deep neural networks are vulnerable to intentionally crafted adversarial examples.
Various methods have been proposed to defend against adversarial word-substitution attacks for neural NLP models.
arXiv Detail & Related papers (2021-08-29T08:11:36Z) - Interpreting Attributions and Interactions of Adversarial Attacks [19.50612458496236]
This paper aims to explain adversarial attacks in terms of how adversarial perturbations contribute to the attacking task.
We define and quantify interactions among adversarial perturbation pixels, and decompose the entire perturbation map into relatively independent perturbation components.
arXiv Detail & Related papers (2021-08-16T04:59:39Z) - Adversarial Robustness through the Lens of Causality [105.51753064807014]
adversarial vulnerability of deep neural networks has attracted significant attention in machine learning.
We propose to incorporate causality into mitigating adversarial vulnerability.
Our method can be seen as the first attempt to leverage causality for mitigating adversarial vulnerability.
arXiv Detail & Related papers (2021-06-11T06:55:02Z) - Towards Defending against Adversarial Examples via Attack-Invariant
Features [147.85346057241605]
Deep neural networks (DNNs) are vulnerable to adversarial noise.
adversarial robustness can be improved by exploiting adversarial examples.
Models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples.
arXiv Detail & Related papers (2021-06-09T12:49:54Z) - Game-theoretic Understanding of Adversarially Learned Features [38.19291233245746]
This paper aims to understand adversarial attacks and defense from a new perspecitve, i.e., the signal-processing behavior of DNNs.
We novelly define the multi-order interaction in game theory, which satisfies six properties.
arXiv Detail & Related papers (2021-03-12T15:56:28Z) - Recent Advances in Understanding Adversarial Robustness of Deep Neural
Networks [15.217367754000913]
It is increasingly important to obtain models with high robustness that are resistant to adversarial examples.
We give preliminary definitions on what adversarial attacks and robustness are.
We study frequently-used benchmarks and mention theoretically-proved bounds for adversarial robustness.
arXiv Detail & Related papers (2020-11-03T07:42:53Z) - Proper Network Interpretability Helps Adversarial Robustness in
Classification [91.39031895064223]
We show that with a proper measurement of interpretation, it is difficult to prevent prediction-evasion adversarial attacks from causing interpretation discrepancy.
We develop an interpretability-aware defensive scheme built only on promoting robust interpretation.
We show that our defense achieves both robust classification and robust interpretation, outperforming state-of-the-art adversarial training methods against attacks of large perturbation.
arXiv Detail & Related papers (2020-06-26T01:31:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.