Reliable Explanations or Random Noise? A Reliability Metric for XAI
- URL: http://arxiv.org/abs/2602.05082v1
- Date: Wed, 04 Feb 2026 22:04:07 GMT
- Title: Reliable Explanations or Random Noise? A Reliability Metric for XAI
- Authors: Poushali Sengupta, Sabita Maharjan, Frank Eliassen, Shashi Raj Pandey, Yan Zhang,
- Abstract summary: We introduce the Explanation Reliability Index (ERI), a family of metrics that quantifies explanation stability under four reliability axioms.<n>ERI enables principled assessment of explanation reliability and supports more trustworthy AI (XAI) systems.
- Score: 6.948460965107209
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, explaining decisions made by complex machine learning models has become essential in high-stakes domains such as energy systems, healthcare, finance, and autonomous systems. However, the reliability of these explanations, namely, whether they remain stable and consistent under realistic, non-adversarial changes, remains largely unmeasured. Widely used methods such as SHAP and Integrated Gradients (IG) are well-motivated by axiomatic notions of attribution, yet their explanations can vary substantially even under system-level conditions, including small input perturbations, correlated representations, and minor model updates. Such variability undermines explanation reliability, as reliable explanations should remain consistent across equivalent input representations and small, performance-preserving model changes. We introduce the Explanation Reliability Index (ERI), a family of metrics that quantifies explanation stability under four reliability axioms: robustness to small input perturbations, consistency under feature redundancy, smoothness across model evolution, and resilience to mild distributional shifts. For each axiom, we derive formal guarantees, including Lipschitz-type bounds and temporal stability results. We further propose ERI-T, a dedicated measure of temporal reliability for sequential models, and introduce ERI-Bench, a benchmark designed to systematically stress-test explanation reliability across synthetic and real-world datasets. Experimental results reveal widespread reliability failures in popular explanation methods, showing that explanations can be unstable under realistic deployment conditions. By exposing and quantifying these instabilities, ERI enables principled assessment of explanation reliability and supports more trustworthy explainable AI (XAI) systems.
Related papers
- Measuring the Fragility of Trust: Devising Credibility Index via Explanation Stability (CIES) for Business Decision Support Systems [3.8615905456206256]
This paper introduces the Credibility Index via Explanation Stability (CIES), a metric that measures how robust a model's explanations are when subject to realistic business noise.<n>CIES captures whether the reasons behind a prediction remain consistent, not just the prediction itself.<n>Results demonstrate that model complexity impacts explanation credibility, class imbalance treatment via SMOTE affects not only predictive performance but also explanation stability.
arXiv Detail & Related papers (2026-03-05T10:11:55Z) - Equivariant Evidential Deep Learning for Interatomic Potentials [55.6997213490859]
Uncertainty quantification is critical for assessing the reliability of machine learning interatomic potentials in molecular dynamics simulations.<n>Existing UQ approaches for MLIPs are often limited by high computational cost or suboptimal performance.<n>We propose textitEquivariant Evidential Deep Learning for Interatomic Potentials ($texte2$IP), a backbone-agnostic framework that models atomic forces and their uncertainty jointly.
arXiv Detail & Related papers (2026-02-11T02:00:25Z) - Explanation Multiplicity in SHAP: Characterization and Assessment [28.413883186555438]
Post-hoc explanations are widely used to justify, contest, and review automated decisions in high-stakes domains such as lending, employment, and healthcare.<n>In practice, however, SHAP explanations can differ substantially across repeated runs, even when the individual, prediction task, and trained model are held fixed.<n>We conceptualize and name this phenomenon explanation multiplicity: the existence of multiple, internally valid but substantively different explanations for the same decision.
arXiv Detail & Related papers (2026-01-19T02:01:18Z) - PaTAS: A Parallel System for Trust Propagation in Neural Networks Using Subjective Logic [1.8485970721272895]
This paper introduces the Parallel Trust Assessment System (PaTAS), a framework for modeling and propagating trust in neural networks.<n>Experiments on real-world and adversarial datasets demonstrate that PaTAS produces interpretable, symmetric, and convergent trust estimates.
arXiv Detail & Related papers (2025-11-25T18:15:36Z) - Fair and Explainable Credit-Scoring under Concept Drift: Adaptive Explanation Frameworks for Evolving Populations [0.0]
We develop adaptive explanation frameworks that recalibrate interpretability and fairness in dynamically evolving credit models.<n>Results show that adaptive methods, particularly rebaselined and surrogate-based explanations, substantially improve temporal stability and reduce disparate impact across demographic groups without degrading predictive accuracy.<n>These findings establish adaptive explainability as a practical mechanism for sustaining transparency, accountability, and ethical reliability in data-driven credit systems.
arXiv Detail & Related papers (2025-11-05T19:14:43Z) - On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [68.62012304574012]
multimodal generative models have sparked critical discussions on their reliability, fairness and potential for misuse.<n>We propose an evaluation framework to assess model reliability by analyzing responses to global and local perturbations in the embedding space.<n>Our method lays the groundwork for detecting unreliable, bias-injected models and tracing the provenance of embedded biases.
arXiv Detail & Related papers (2024-11-21T09:46:55Z) - Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal
Approach [51.012396632595554]
Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments.
Recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail in unseen domains.
We develop an approach based on conditional mutual information with respect to RS-SCM, then rigorously rectify the spurious and fake invariant effects.
arXiv Detail & Related papers (2023-12-15T12:58:05Z) - LaPLACE: Probabilistic Local Model-Agnostic Causal Explanations [1.0370398945228227]
We introduce LaPLACE-explainer, designed to provide probabilistic cause-and-effect explanations for machine learning models.
The LaPLACE-Explainer component leverages the concept of a Markov blanket to establish statistical boundaries between relevant and non-relevant features.
Our approach offers causal explanations and outperforms LIME and SHAP in terms of local accuracy and consistency of explained features.
arXiv Detail & Related papers (2023-10-01T04:09:59Z) - Measuring and Modeling Uncertainty Degree for Monocular Depth Estimation [50.920911532133154]
The intrinsic ill-posedness and ordinal-sensitive nature of monocular depth estimation (MDE) models pose major challenges to the estimation of uncertainty degree.
We propose to model the uncertainty of MDE models from the perspective of the inherent probability distributions.
By simply introducing additional training regularization terms, our model, with surprisingly simple formations and without requiring extra modules or multiple inferences, can provide uncertainty estimations with state-of-the-art reliability.
arXiv Detail & Related papers (2023-07-19T12:11:15Z) - Probabilities Are Not Enough: Formal Controller Synthesis for Stochastic
Dynamical Models with Epistemic Uncertainty [68.00748155945047]
Capturing uncertainty in models of complex dynamical systems is crucial to designing safe controllers.
Several approaches use formal abstractions to synthesize policies that satisfy temporal specifications related to safety and reachability.
Our contribution is a novel abstraction-based controller method for continuous-state models with noise, uncertain parameters, and external disturbances.
arXiv Detail & Related papers (2022-10-12T07:57:03Z) - Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature.
We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance.
By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z) - Reliable Post hoc Explanations: Modeling Uncertainty in Explainability [44.9824285459365]
Black box explanations are increasingly being employed to establish model credibility in high-stakes settings.
prior work demonstrates that explanations generated by state-of-the-art techniques are inconsistent, unstable, and provide very little insight into their correctness and reliability.
We develop a novel Bayesian framework for generating local explanations along with their associated uncertainty.
arXiv Detail & Related papers (2020-08-11T22:52:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.