Rethinking Stability for Attribution-based Explanations
- URL: http://arxiv.org/abs/2203.06877v1
- Date: Mon, 14 Mar 2022 06:19:27 GMT
- Title: Rethinking Stability for Attribution-based Explanations
- Authors: Chirag Agarwal, Nari Johnson, Martin Pawelczyk, Satyapriya Krishna,
Eshika Saxena, Marinka Zitnik, and Himabindu Lakkaraju
- Abstract summary: We introduce metrics to quantify the stability of an explanation and show that several popular explanation methods are unstable.
In particular, we propose new Relative Stability metrics that measure the change in output explanation with respect to change in input, model representation, or output of the underlying predictor.
- Score: 20.215505482157255
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As attribution-based explanation methods are increasingly used to establish
model trustworthiness in high-stakes situations, it is critical to ensure that
these explanations are stable, e.g., robust to infinitesimal perturbations to
an input. However, previous works have shown that state-of-the-art explanation
methods generate unstable explanations. Here, we introduce metrics to quantify
the stability of an explanation and show that several popular explanation
methods are unstable. In particular, we propose new Relative Stability metrics
that measure the change in output explanation with respect to change in input,
model representation, or output of the underlying predictor. Finally, our
experimental evaluation with three real-world datasets demonstrates interesting
insights for seven explanation methods and different stability metrics.
Related papers
- Stable Update of Regression Trees [0.0]
We focus on the stability of an inherently explainable machine learning method, namely regression trees.
We propose a regularization method, where data points are weighted based on the uncertainty in the initial model.
Results show that the proposed update method improves stability while achieving similar or better predictive performance.
arXiv Detail & Related papers (2024-02-21T09:41:56Z) - On the stability, correctness and plausibility of visual explanation
methods based on feature importance [0.0]
We study the articulation between the stability, correctness and plausibility of explanations based on feature importance for image classifiers.
We show that the existing metrics for evaluating these properties do not always agree, raising the issue of what constitutes a good evaluation metric for explanations.
arXiv Detail & Related papers (2023-10-25T08:59:21Z) - On Minimizing the Impact of Dataset Shifts on Actionable Explanations [14.83940426256441]
We conduct rigorous theoretical analysis to demonstrate that model curvature, weight decay parameters while training, and the magnitude of the dataset shift are key factors that determine the extent of explanation (in)stability.
arXiv Detail & Related papers (2023-06-11T16:34:19Z) - Evaluating the Robustness of Interpretability Methods through
Explanation Invariance and Equivariance [72.50214227616728]
Interpretability methods are valuable only if their explanations faithfully describe the explained model.
We consider neural networks whose predictions are invariant under a specific symmetry group.
arXiv Detail & Related papers (2023-04-13T17:59:03Z) - Identifying the Source of Vulnerability in Explanation Discrepancy: A
Case Study in Neural Text Classification [18.27912226867123]
Some recent works observed the instability of post-hoc explanations when input side perturbations are applied to the model.
This raises the interest and concern in the stability of post-hoc explanations.
This work explores the potential source that leads to unstable post-hoc explanations.
arXiv Detail & Related papers (2022-12-10T16:04:34Z) - Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles [50.81061839052459]
We formalize the generation of robust counterfactual explanations as a probabilistic problem.
We show the link between the robustness of ensemble models and the robustness of base learners.
Our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.
arXiv Detail & Related papers (2022-05-27T17:28:54Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Explainability in Process Outcome Prediction: Guidelines to Obtain
Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction.
This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Reliable Post hoc Explanations: Modeling Uncertainty in Explainability [44.9824285459365]
Black box explanations are increasingly being employed to establish model credibility in high-stakes settings.
prior work demonstrates that explanations generated by state-of-the-art techniques are inconsistent, unstable, and provide very little insight into their correctness and reliability.
We develop a novel Bayesian framework for generating local explanations along with their associated uncertainty.
arXiv Detail & Related papers (2020-08-11T22:52:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.