Reliable Post hoc Explanations: Modeling Uncertainty in Explainability
- URL: http://arxiv.org/abs/2008.05030v4
- Date: Sat, 6 Nov 2021 18:35:28 GMT
- Title: Reliable Post hoc Explanations: Modeling Uncertainty in Explainability
- Authors: Dylan Slack, Sophie Hilgard, Sameer Singh, Himabindu Lakkaraju
- Abstract summary: Black box explanations are increasingly being employed to establish model credibility in high-stakes settings.
prior work demonstrates that explanations generated by state-of-the-art techniques are inconsistent, unstable, and provide very little insight into their correctness and reliability.
We develop a novel Bayesian framework for generating local explanations along with their associated uncertainty.
- Score: 44.9824285459365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As black box explanations are increasingly being employed to establish model
credibility in high-stakes settings, it is important to ensure that these
explanations are accurate and reliable. However, prior work demonstrates that
explanations generated by state-of-the-art techniques are inconsistent,
unstable, and provide very little insight into their correctness and
reliability. In addition, these methods are also computationally inefficient,
and require significant hyper-parameter tuning. In this paper, we address the
aforementioned challenges by developing a novel Bayesian framework for
generating local explanations along with their associated uncertainty. We
instantiate this framework to obtain Bayesian versions of LIME and KernelSHAP
which output credible intervals for the feature importances, capturing the
associated uncertainty. The resulting explanations not only enable us to make
concrete inferences about their quality (e.g., there is a 95% chance that the
feature importance lies within the given range), but are also highly consistent
and stable. We carry out a detailed theoretical analysis that leverages the
aforementioned uncertainty to estimate how many perturbations to sample, and
how to sample for faster convergence. This work makes the first attempt at
addressing several critical issues with popular explanation methods in one
shot, thereby generating consistent, stable, and reliable explanations with
guarantees in a computationally efficient manner. Experimental evaluation with
multiple real world datasets and user studies demonstrate that the efficacy of
the proposed framework.
Related papers
- Reliable Explanations or Random Noise? A Reliability Metric for XAI [6.948460965107209]
We introduce the Explanation Reliability Index (ERI), a family of metrics that quantifies explanation stability under four reliability axioms.<n>ERI enables principled assessment of explanation reliability and supports more trustworthy AI (XAI) systems.
arXiv Detail & Related papers (2026-02-04T22:04:07Z) - Explanation Multiplicity in SHAP: Characterization and Assessment [28.413883186555438]
Post-hoc explanations are widely used to justify, contest, and review automated decisions in high-stakes domains such as lending, employment, and healthcare.<n>In practice, however, SHAP explanations can differ substantially across repeated runs, even when the individual, prediction task, and trained model are held fixed.<n>We conceptualize and name this phenomenon explanation multiplicity: the existence of multiple, internally valid but substantively different explanations for the same decision.
arXiv Detail & Related papers (2026-01-19T02:01:18Z) - Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency [78.91846841708586]
We show that even facts answered with perfect self-consistency can rapidly collapse under mild contextual interference.<n>We propose Neighbor-Consistency Belief (NCB), a structural measure of belief that evaluates response coherence across a conceptual neighborhood.<n>We also present Structure-Aware Training (SAT), which optimize context-invariant belief structure and reduces long-tail knowledge brittleness by approximately 30%.
arXiv Detail & Related papers (2026-01-09T16:23:21Z) - ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning [2.1461777157838724]
We introduce ReasonBENCH, the first benchmark designed to quantify the underlying instability in large language models (LLMs) reasoning.<n>Across tasks from different domains, we find that the vast majority of reasoning strategies and models exhibit high instability.<n>We further analyze the impact of prompts, model families, and scale on the trade-off between solve rate and stability.
arXiv Detail & Related papers (2025-12-08T18:26:58Z) - Probabilistic Modeling of Disparity Uncertainty for Robust and Efficient Stereo Matching [61.73532883992135]
We propose a new uncertainty-aware stereo matching framework.
We adopt Bayes risk as the measurement of uncertainty and use it to separately estimate data and model uncertainty.
arXiv Detail & Related papers (2024-12-24T23:28:20Z) - Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete.
We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z) - FTS: A Framework to Find a Faithful TimeSieve [43.46528328262752]
We propose a novel framework aimed at identifying and rectifying unfaithfulness in TimeSieve.
Our framework is designed to enhance the model's stability and faithfulness, ensuring that its outputs are less susceptible to the aforementioned factors.
arXiv Detail & Related papers (2024-05-30T02:59:49Z) - LaPLACE: Probabilistic Local Model-Agnostic Causal Explanations [1.0370398945228227]
We introduce LaPLACE-explainer, designed to provide probabilistic cause-and-effect explanations for machine learning models.
The LaPLACE-Explainer component leverages the concept of a Markov blanket to establish statistical boundaries between relevant and non-relevant features.
Our approach offers causal explanations and outperforms LIME and SHAP in terms of local accuracy and consistency of explained features.
arXiv Detail & Related papers (2023-10-01T04:09:59Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - Calibrated Explanations: with Uncertainty Information and
Counterfactuals [0.1843404256219181]
Calibrated Explanations (CE) is built on the foundation of Venn-Abers.
It provides uncertainty quantification for both feature weights and the model's probability estimates.
Results from an evaluation with 25 benchmark datasets underscore the efficacy of CE.
arXiv Detail & Related papers (2023-05-03T17:52:41Z) - SoK: Modeling Explainability in Security Analytics for Interpretability,
Trustworthiness, and Usability [2.656910687062026]
Interpretability, trustworthiness, and usability are key considerations in high-stake security applications.
Deep learning models behave as black boxes in which identifying important features and factors that led to a classification or a prediction is difficult.
Most explanation methods provide inconsistent explanations, have low fidelity, and are susceptible to adversarial manipulation.
arXiv Detail & Related papers (2022-10-31T15:01:49Z) - Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles [50.81061839052459]
We formalize the generation of robust counterfactual explanations as a probabilistic problem.
We show the link between the robustness of ensemble models and the robustness of base learners.
Our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.
arXiv Detail & Related papers (2022-05-27T17:28:54Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature.
We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance.
By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z) - On the Trustworthiness of Tree Ensemble Explainability Methods [0.9558392439655014]
Feature importance methods (e.g. gain and SHAP) are among the most popular explainability methods used to address this need.
For any explainability technique to be trustworthy and meaningful, it has to provide an explanation that is accurate and stable.
We evaluate the accuracy and stability of global feature importance methods through comprehensive experiments done on simulations and four real-world datasets.
arXiv Detail & Related papers (2021-09-30T20:56:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.