Reliable Post hoc Explanations: Modeling Uncertainty in Explainability
- URL: http://arxiv.org/abs/2008.05030v4
- Date: Sat, 6 Nov 2021 18:35:28 GMT
- Title: Reliable Post hoc Explanations: Modeling Uncertainty in Explainability
- Authors: Dylan Slack, Sophie Hilgard, Sameer Singh, Himabindu Lakkaraju
- Abstract summary: Black box explanations are increasingly being employed to establish model credibility in high-stakes settings.
prior work demonstrates that explanations generated by state-of-the-art techniques are inconsistent, unstable, and provide very little insight into their correctness and reliability.
We develop a novel Bayesian framework for generating local explanations along with their associated uncertainty.
- Score: 44.9824285459365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As black box explanations are increasingly being employed to establish model
credibility in high-stakes settings, it is important to ensure that these
explanations are accurate and reliable. However, prior work demonstrates that
explanations generated by state-of-the-art techniques are inconsistent,
unstable, and provide very little insight into their correctness and
reliability. In addition, these methods are also computationally inefficient,
and require significant hyper-parameter tuning. In this paper, we address the
aforementioned challenges by developing a novel Bayesian framework for
generating local explanations along with their associated uncertainty. We
instantiate this framework to obtain Bayesian versions of LIME and KernelSHAP
which output credible intervals for the feature importances, capturing the
associated uncertainty. The resulting explanations not only enable us to make
concrete inferences about their quality (e.g., there is a 95% chance that the
feature importance lies within the given range), but are also highly consistent
and stable. We carry out a detailed theoretical analysis that leverages the
aforementioned uncertainty to estimate how many perturbations to sample, and
how to sample for faster convergence. This work makes the first attempt at
addressing several critical issues with popular explanation methods in one
shot, thereby generating consistent, stable, and reliable explanations with
guarantees in a computationally efficient manner. Experimental evaluation with
multiple real world datasets and user studies demonstrate that the efficacy of
the proposed framework.
Related papers
- Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete.
We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z) - FTS: A Framework to Find a Faithful TimeSieve [43.46528328262752]
We propose a novel framework aimed at identifying and rectifying unfaithfulness in TimeSieve.
Our framework is designed to enhance the model's stability and faithfulness, ensuring that its outputs are less susceptible to the aforementioned factors.
arXiv Detail & Related papers (2024-05-30T02:59:49Z) - LaPLACE: Probabilistic Local Model-Agnostic Causal Explanations [1.0370398945228227]
We introduce LaPLACE-explainer, designed to provide probabilistic cause-and-effect explanations for machine learning models.
The LaPLACE-Explainer component leverages the concept of a Markov blanket to establish statistical boundaries between relevant and non-relevant features.
Our approach offers causal explanations and outperforms LIME and SHAP in terms of local accuracy and consistency of explained features.
arXiv Detail & Related papers (2023-10-01T04:09:59Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - Calibrated Explanations: with Uncertainty Information and
Counterfactuals [0.1843404256219181]
Calibrated Explanations (CE) is built on the foundation of Venn-Abers.
It provides uncertainty quantification for both feature weights and the model's probability estimates.
Results from an evaluation with 25 benchmark datasets underscore the efficacy of CE.
arXiv Detail & Related papers (2023-05-03T17:52:41Z) - SoK: Modeling Explainability in Security Analytics for Interpretability,
Trustworthiness, and Usability [2.656910687062026]
Interpretability, trustworthiness, and usability are key considerations in high-stake security applications.
Deep learning models behave as black boxes in which identifying important features and factors that led to a classification or a prediction is difficult.
Most explanation methods provide inconsistent explanations, have low fidelity, and are susceptible to adversarial manipulation.
arXiv Detail & Related papers (2022-10-31T15:01:49Z) - Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles [50.81061839052459]
We formalize the generation of robust counterfactual explanations as a probabilistic problem.
We show the link between the robustness of ensemble models and the robustness of base learners.
Our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.
arXiv Detail & Related papers (2022-05-27T17:28:54Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature.
We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance.
By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z) - On the Trustworthiness of Tree Ensemble Explainability Methods [0.9558392439655014]
Feature importance methods (e.g. gain and SHAP) are among the most popular explainability methods used to address this need.
For any explainability technique to be trustworthy and meaningful, it has to provide an explanation that is accurate and stable.
We evaluate the accuracy and stability of global feature importance methods through comprehensive experiments done on simulations and four real-world datasets.
arXiv Detail & Related papers (2021-09-30T20:56:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.