Related papers: Provable Robust Saliency-based Explanations

Provable Robust Saliency-based Explanations

URL: http://arxiv.org/abs/2212.14106v3
Date: Sat, 8 Jul 2023 17:57:36 GMT
Title: Provable Robust Saliency-based Explanations
Authors: Chao Chen, Chenghua Guo, Guixiang Ma, Ming Zeng, Xi Zhang, Sihong Xie
Abstract summary: We show that R2ET attains higher explanation robustness under stealthy attacks while retaining model accuracy. Experiments with a wide spectrum of network architectures and data modalities demonstrate that R2ET attains higher explanation robustness under stealthy attacks while retaining model accuracy.
Score: 16.217374556142484
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robust explanations of machine learning models are critical to establishing human trust in the models. The top-$k$ intersection is widely used to evaluate the robustness of explanations. However, most existing attacking and defense strategies are based on $\ell_p$ norms, thus creating a mismatch between the evaluation and optimization objectives. To this end, we define explanation thickness for measuring top-$k$ salient features ranking stability, and design the \textit{R2ET} algorithm based on a novel tractable surrogate to maximize the thickness and stabilize the top salient features efficiently. Theoretically, we prove a connection between R2ET and adversarial training; using a novel multi-objective optimization formulation and a generalization error bound, we further prove that the surrogate objective can improve both the numerical and statistical stability of the explanations. Experiments with a wide spectrum of network architectures and data modalities demonstrate that R2ET attains higher explanation robustness under stealthy attacks while retaining model accuracy.

Related papers

Provable Robust Overfitting Mitigation in Wasserstein Distributionally Robust Optimization [23.17991102874279]
We propose a novel robust optimization framework under a new uncertainty set for adversarial noise via Wasserstein distance and statistical error. We demonstrate that our method significantly mitigates robust overfitting and enhances robustness within the framework of WDRO.
arXiv Detail & Related papers (2025-03-06T10:58:35Z)
TAET: Two-Stage Adversarial Equalization Training on Long-Tailed Distributions [3.9635480458924994]
Adversarial robustness is a critical challenge in deploying deep neural networks for real-world applications. We propose a novel training framework, TAET, which integrates an initial stabilization phase followed by a stratified adversarial training phase. Our method surpasses existing advanced defenses, achieving significant improvements in both memory and computational efficiency.
arXiv Detail & Related papers (2025-03-02T12:07:00Z)
Is Smoothness the Key to Robustness? A Comparison of Attention and Convolution Models Using a Novel Metric [0.0]
Existing robustness evaluation approaches often lack theoretical generality or rely heavily on empirical assessments. We propose TopoLip, a metric based on layer-wise analysis that bridges topological data analysis and Lipschitz continuity for robustness evaluation.
arXiv Detail & Related papers (2024-10-23T07:44:14Z)
Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete. We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z)
Stability Evaluation via Distributional Perturbation Analysis [28.379994938809133]
We propose a stability evaluation criterion based on distributional perturbations. Our stability evaluation criterion can address both emphdata corruptions and emphsub-population shifts. Empirically, we validate the practical utility of our stability evaluation criterion across a host of real-world applications.
arXiv Detail & Related papers (2024-05-06T06:47:14Z)
The Risk of Federated Learning to Skew Fine-Tuning Features and Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model. We introduce three robustness indicators and conduct experiments across diverse robust datasets. Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z)
Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework. Our importance weights are obtained by optimizing the KL-divergence regularized loss function. Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z)
Robust Feature Inference: A Test-time Defense Strategy using Spectral Projections [12.807619042576018]
We propose a novel test-time defense strategy called Robust Feature Inference (RFI) RFI is easy to integrate with any existing (robust) training procedure without additional test-time computation. We show that RFI improves robustness across adaptive and transfer attacks consistently.
arXiv Detail & Related papers (2023-07-21T16:18:58Z)
Robust Ranking Explanations [16.217374556142484]
It is critical to make top salient features robust to adversarial attacks, especially those against the more vulnerable gradient-based explanations. Existing defense measures using $ell_p$-norms, which have weaker protection power. We define explanation thickness for measuring salient features ranking stability, and derive tractable surrogate bounds of the thickness to design the textitR2ET algorithm.
arXiv Detail & Related papers (2023-07-08T18:05:41Z)
Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage [15.858892479232656]
We study robust offline reinforcement learning (robust offline RL) We propose a generic algorithm framework called Doubly Pessimistic Model-based Policy Optimization ($P2MPO$) We show that $P2MPO$ enjoys a $tildemathcalO(n-1/2)$ convergence rate, where $n$ is the dataset size.
arXiv Detail & Related papers (2023-05-16T17:58:05Z)
Semantic Image Attack for Visual Model Diagnosis [80.36063332820568]
In practice, metric analysis on a specific train and test dataset does not guarantee reliable or fair ML models. This paper proposes Semantic Image Attack (SIA), a method based on the adversarial attack that provides semantic adversarial images.
arXiv Detail & Related papers (2023-03-23T03:13:04Z)
A Stability Analysis of Fine-Tuning a Pre-Trained Model [46.6761331971071]
Fine-tuning a pre-trained model is one of the most promising paradigms in recent NLP research. Fine-tuning suffers from the instability problem, i.e., tuning the same model under the same setting results in significantly different performance. We propose a novel theoretical stability analysis of fine-tuning that focuses on two commonly used settings.
arXiv Detail & Related papers (2023-01-24T05:11:17Z)
Explicit Tradeoffs between Adversarial and Natural Distributional Robustness [48.44639585732391]
In practice, models need to enjoy both types of robustness to ensure reliability. In this work, we show that in fact, explicit tradeoffs exist between adversarial and natural distributional robustness.
arXiv Detail & Related papers (2022-09-15T19:58:01Z)
Adversarial Robustness under Long-Tailed Distribution [93.50792075460336]
Adversarial robustness has attracted extensive studies recently by revealing the vulnerability and intrinsic characteristics of deep networks. In this work we investigate the adversarial vulnerability as well as defense under long-tailed distributions. We propose a clean yet effective framework, RoBal, which consists of two dedicated modules, a scale-invariant and data re-balancing.
arXiv Detail & Related papers (2021-04-06T17:53:08Z)
Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning. These measures should account for the wide variety of models used in practice. The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
A general framework for defining and optimizing robustness [74.67016173858497]
We propose a rigorous and flexible framework for defining different types of robustness properties for classifiers. Our concept is based on postulates that robustness of a classifier should be considered as a property that is independent of accuracy. We develop a very general robustness framework that is applicable to any type of classification model.
arXiv Detail & Related papers (2020-06-19T13:24:20Z)
Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function. It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.