Provable Robust Saliency-based Explanations
- URL: http://arxiv.org/abs/2212.14106v3
- Date: Sat, 8 Jul 2023 17:57:36 GMT
- Title: Provable Robust Saliency-based Explanations
- Authors: Chao Chen, Chenghua Guo, Guixiang Ma, Ming Zeng, Xi Zhang, Sihong Xie
- Abstract summary: We show that R2ET attains higher explanation robustness under stealthy attacks while retaining model accuracy.
Experiments with a wide spectrum of network architectures and data modalities demonstrate that R2ET attains higher explanation robustness under stealthy attacks while retaining model accuracy.
- Score: 16.217374556142484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust explanations of machine learning models are critical to establishing
human trust in the models. The top-$k$ intersection is widely used to evaluate
the robustness of explanations. However, most existing attacking and defense
strategies are based on $\ell_p$ norms, thus creating a mismatch between the
evaluation and optimization objectives. To this end, we define explanation
thickness for measuring top-$k$ salient features ranking stability, and design
the \textit{R2ET} algorithm based on a novel tractable surrogate to maximize
the thickness and stabilize the top salient features efficiently.
Theoretically, we prove a connection between R2ET and adversarial training;
using a novel multi-objective optimization formulation and a generalization
error bound, we further prove that the surrogate objective can improve both the
numerical and statistical stability of the explanations. Experiments with a
wide spectrum of network architectures and data modalities demonstrate that
R2ET attains higher explanation robustness under stealthy attacks while
retaining model accuracy.
Related papers
- Is Smoothness the Key to Robustness? A Comparison of Attention and Convolution Models Using a Novel Metric [0.0]
Existing robustness evaluation approaches often lack theoretical generality or rely heavily on empirical assessments.
We propose TopoLip, a metric based on layer-wise analysis that bridges topological data analysis and Lipschitz continuity for robustness evaluation.
arXiv Detail & Related papers (2024-10-23T07:44:14Z) - Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete.
We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - Robust Feature Inference: A Test-time Defense Strategy using Spectral Projections [12.807619042576018]
We propose a novel test-time defense strategy called Robust Feature Inference (RFI)
RFI is easy to integrate with any existing (robust) training procedure without additional test-time computation.
We show that RFI improves robustness across adaptive and transfer attacks consistently.
arXiv Detail & Related papers (2023-07-21T16:18:58Z) - Robust Ranking Explanations [16.217374556142484]
It is critical to make top salient features robust to adversarial attacks, especially those against the more vulnerable gradient-based explanations.
Existing defense measures using $ell_p$-norms, which have weaker protection power.
We define explanation thickness for measuring salient features ranking stability, and derive tractable surrogate bounds of the thickness to design the textitR2ET algorithm.
arXiv Detail & Related papers (2023-07-08T18:05:41Z) - Double Pessimism is Provably Efficient for Distributionally Robust
Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage [15.858892479232656]
We study robust offline reinforcement learning (robust offline RL)
We propose a generic algorithm framework called Doubly Pessimistic Model-based Policy Optimization ($P2MPO$)
We show that $P2MPO$ enjoys a $tildemathcalO(n-1/2)$ convergence rate, where $n$ is the dataset size.
arXiv Detail & Related papers (2023-05-16T17:58:05Z) - Semantic Image Attack for Visual Model Diagnosis [80.36063332820568]
In practice, metric analysis on a specific train and test dataset does not guarantee reliable or fair ML models.
This paper proposes Semantic Image Attack (SIA), a method based on the adversarial attack that provides semantic adversarial images.
arXiv Detail & Related papers (2023-03-23T03:13:04Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - A general framework for defining and optimizing robustness [74.67016173858497]
We propose a rigorous and flexible framework for defining different types of robustness properties for classifiers.
Our concept is based on postulates that robustness of a classifier should be considered as a property that is independent of accuracy.
We develop a very general robustness framework that is applicable to any type of classification model.
arXiv Detail & Related papers (2020-06-19T13:24:20Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.