Proper Network Interpretability Helps Adversarial Robustness in
Classification
- URL: http://arxiv.org/abs/2006.14748v2
- Date: Wed, 21 Oct 2020 18:56:05 GMT
- Title: Proper Network Interpretability Helps Adversarial Robustness in
Classification
- Authors: Akhilan Boopathy, Sijia Liu, Gaoyuan Zhang, Cynthia Liu, Pin-Yu Chen,
Shiyu Chang, Luca Daniel
- Abstract summary: We show that with a proper measurement of interpretation, it is difficult to prevent prediction-evasion adversarial attacks from causing interpretation discrepancy.
We develop an interpretability-aware defensive scheme built only on promoting robust interpretation.
We show that our defense achieves both robust classification and robust interpretation, outperforming state-of-the-art adversarial training methods against attacks of large perturbation.
- Score: 91.39031895064223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works have empirically shown that there exist adversarial examples
that can be hidden from neural network interpretability (namely, making network
interpretation maps visually similar), or interpretability is itself
susceptible to adversarial attacks. In this paper, we theoretically show that
with a proper measurement of interpretation, it is actually difficult to
prevent prediction-evasion adversarial attacks from causing interpretation
discrepancy, as confirmed by experiments on MNIST, CIFAR-10 and Restricted
ImageNet. Spurred by that, we develop an interpretability-aware defensive
scheme built only on promoting robust interpretation (without the need for
resorting to adversarial loss minimization). We show that our defense achieves
both robust classification and robust interpretation, outperforming
state-of-the-art adversarial training methods against attacks of large
perturbation in particular.
Related papers
- Detecting Adversarial Attacks in Semantic Segmentation via Uncertainty Estimation: A Deep Analysis [12.133306321357999]
We propose an uncertainty-based method for detecting adversarial attacks on neural networks for semantic segmentation.
We conduct a detailed analysis of uncertainty-based detection of adversarial attacks and various state-of-the-art neural networks.
Our numerical experiments show the effectiveness of the proposed uncertainty-based detection method.
arXiv Detail & Related papers (2024-08-19T14:13:30Z) - Uncertainty-based Detection of Adversarial Attacks in Semantic
Segmentation [16.109860499330562]
We introduce an uncertainty-based approach for the detection of adversarial attacks in semantic segmentation.
We demonstrate the ability of our approach to detect perturbed images across multiple types of adversarial attacks.
arXiv Detail & Related papers (2023-05-22T08:36:35Z) - Interpretability is a Kind of Safety: An Interpreter-based Ensemble for
Adversary Defense [28.398901783858005]
We propose an interpreter-based ensemble framework called X-Ensemble for robust defense adversary.
X-Ensemble employs the Random Forests (RF) model to combine sub-detectors into an ensemble detector for adversarial hybrid attacks defense.
arXiv Detail & Related papers (2023-04-14T04:32:06Z) - Adversarial Visual Robustness by Causal Intervention [56.766342028800445]
Adversarial training is the de facto most promising defense against adversarial examples.
Yet, its passive nature inevitably prevents it from being immune to unknown attackers.
We provide a causal viewpoint of adversarial vulnerability: the cause is the confounder ubiquitously existing in learning.
arXiv Detail & Related papers (2021-06-17T14:23:54Z) - Combating Adversaries with Anti-Adversaries [118.70141983415445]
In particular, our layer generates an input perturbation in the opposite direction of the adversarial one.
We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models.
Our anti-adversary layer significantly enhances model robustness while coming at no cost on clean accuracy.
arXiv Detail & Related papers (2021-03-26T09:36:59Z) - Resilience of Bayesian Layer-Wise Explanations under Adversarial Attacks [3.222802562733787]
We show that for deterministic Neural Networks, saliency interpretations are remarkably brittle even when the attacks fail.
We suggest and demonstrate empirically that saliency explanations provided by Bayesian Neural Networks are considerably more stable under adversarial perturbations.
arXiv Detail & Related papers (2021-02-22T14:07:24Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z) - Adversarial Attacks and Defenses: An Interpretation Perspective [80.23908920686625]
We review recent work on adversarial attacks and defenses, particularly from the perspective of machine learning interpretation.
The goal of model interpretation, or interpretable machine learning, is to extract human-understandable terms for the working mechanism of models.
For each type of interpretation, we elaborate on how it could be used for adversarial attacks and defenses.
arXiv Detail & Related papers (2020-04-23T23:19:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.