Robust Ranking Explanations
- URL: http://arxiv.org/abs/2307.04024v1
- Date: Sat, 8 Jul 2023 18:05:41 GMT
- Title: Robust Ranking Explanations
- Authors: Chao Chen, Chenghua Guo, Guixiang Ma, Ming Zeng, Xi Zhang, Sihong Xie
- Abstract summary: It is critical to make top salient features robust to adversarial attacks, especially those against the more vulnerable gradient-based explanations.
Existing defense measures using $ell_p$-norms, which have weaker protection power.
We define explanation thickness for measuring salient features ranking stability, and derive tractable surrogate bounds of the thickness to design the textitR2ET algorithm.
- Score: 16.217374556142484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust explanations of machine learning models are critical to establish
human trust in the models. Due to limited cognition capability, most humans can
only interpret the top few salient features. It is critical to make top salient
features robust to adversarial attacks, especially those against the more
vulnerable gradient-based explanations. Existing defense measures robustness
using $\ell_p$-norms, which have weaker protection power. We define explanation
thickness for measuring salient features ranking stability, and derive
tractable surrogate bounds of the thickness to design the \textit{R2ET}
algorithm to efficiently maximize the thickness and anchor top salient
features. Theoretically, we prove a connection between R2ET and adversarial
training. Experiments with a wide spectrum of network architectures and data
modalities, including brain networks, demonstrate that R2ET attains higher
explanation robustness under stealthy attacks while retaining accuracy.
Related papers
- Robust Graph Neural Networks via Unbiased Aggregation [18.681451049083407]
adversarial robustness of Graph Neural Networks (GNNs) has been questioned due to the false sense of security uncovered by strong adaptive attacks.
We provide a unified robust estimation point of view to understand their robustness and limitations.
arXiv Detail & Related papers (2023-11-25T05:34:36Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - Provable Robust Saliency-based Explanations [16.217374556142484]
We show that R2ET attains higher explanation robustness under stealthy attacks while retaining model accuracy.
Experiments with a wide spectrum of network architectures and data modalities demonstrate that R2ET attains higher explanation robustness under stealthy attacks while retaining model accuracy.
arXiv Detail & Related papers (2022-12-28T22:05:32Z) - Revisiting Residual Networks for Adversarial Robustness: An
Architectural Perspective [22.59262601575886]
We focus on residual networks and consider architecture design at the block level, i.e., topology, kernel size, activation, and normalization.
We present a portfolio of adversarially robust residual networks, RobustResNets, spanning a broad spectrum of model capacities.
arXiv Detail & Related papers (2022-12-21T13:19:25Z) - Masking Adversarial Damage: Finding Adversarial Saliency for Robust and
Sparse Network [33.18197518590706]
Adversarial examples provoke weak reliability and potential security issues in deep neural networks.
We propose a novel adversarial pruning method, Masking Adversarial Damage (MAD) that employs second-order information of adversarial loss.
We show that MAD effectively prunes adversarially trained networks without loosing adversarial robustness and shows better performance than previous adversarial pruning methods.
arXiv Detail & Related papers (2022-04-06T11:28:06Z) - Clustering Effect of (Linearized) Adversarial Robust Models [60.25668525218051]
We propose a novel understanding of adversarial robustness and apply it on more tasks including domain adaption and robustness boosting.
Experimental evaluations demonstrate the rationality and superiority of our proposed clustering strategy.
arXiv Detail & Related papers (2021-11-25T05:51:03Z) - Exploring Architectural Ingredients of Adversarially Robust Deep Neural
Networks [98.21130211336964]
Deep neural networks (DNNs) are known to be vulnerable to adversarial attacks.
In this paper, we investigate the impact of network width and depth on the robustness of adversarially trained DNNs.
arXiv Detail & Related papers (2021-10-07T23:13:33Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Adversarial Robustness under Long-Tailed Distribution [93.50792075460336]
Adversarial robustness has attracted extensive studies recently by revealing the vulnerability and intrinsic characteristics of deep networks.
In this work we investigate the adversarial vulnerability as well as defense under long-tailed distributions.
We propose a clean yet effective framework, RoBal, which consists of two dedicated modules, a scale-invariant and data re-balancing.
arXiv Detail & Related papers (2021-04-06T17:53:08Z) - Do Wider Neural Networks Really Help Adversarial Robustness? [92.8311752980399]
We show that the model robustness is closely related to the tradeoff between natural accuracy and perturbation stability.
We propose a new Width Adjusted Regularization (WAR) method that adaptively enlarges $lambda$ on wide models.
arXiv Detail & Related papers (2020-10-03T04:46:17Z) - Second Order Optimization for Adversarial Robustness and
Interpretability [6.700873164609009]
We propose a novel regularizer which incorporates first and second order information via a quadratic approximation to the adversarial loss.
It is shown that using only a single iteration in our regularizer achieves stronger robustness than prior gradient and curvature regularization schemes.
It retains the interesting facet of AT that networks learn features which are well-aligned with human perception.
arXiv Detail & Related papers (2020-09-10T15:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.