Related papers: Are machine learning interpretations reliable? A stability study on global interpretations

Are machine learning interpretations reliable? A stability study on global interpretations

URL: http://arxiv.org/abs/2505.15728v1
Date: Wed, 21 May 2025 16:34:11 GMT
Title: Are machine learning interpretations reliable? A stability study on global interpretations
Authors: Luqin Gan, Tarek M. Zikry, Genevera I. Allen,
Abstract summary: We conduct the first systematic, large-scale empirical stability study on popular machine learning global interpretations.<n>Our findings reveal that popular interpretation methods are frequently unstable, notably less stable than the predictions themselves.<n>No single method consistently provides the most stable interpretations across a range of benchmark datasets.
Score: 3.9325957466009207
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As machine learning systems are increasingly used in high-stakes domains, there is a growing emphasis placed on making them interpretable to improve trust in these systems. In response, a range of interpretable machine learning (IML) methods have been developed to generate human-understandable insights into otherwise black box models. With these methods, a fundamental question arises: Are these interpretations reliable? Unlike with prediction accuracy or other evaluation metrics for supervised models, the proximity to the true interpretation is difficult to define. Instead, we ask a closely related question that we argue is a prerequisite for reliability: Are these interpretations stable? We define stability as findings that are consistent or reliable under small random perturbations to the data or algorithms. In this study, we conduct the first systematic, large-scale empirical stability study on popular machine learning global interpretations for both supervised and unsupervised tasks on tabular data. Our findings reveal that popular interpretation methods are frequently unstable, notably less stable than the predictions themselves, and that there is no association between the accuracy of machine learning predictions and the stability of their associated interpretations. Moreover, we show that no single method consistently provides the most stable interpretations across a range of benchmark datasets. Overall, these results suggest that interpretability alone does not warrant trust, and underscores the need for rigorous evaluation of interpretation stability in future work. To support these principles, we have developed and released an open source IML dashboard and Python package to enable researchers to assess the stability and reliability of their own data-driven interpretations and discoveries.

Related papers

Aurora: Are Android Malware Classifiers Reliable and Stable under Distribution Shift? [51.12297424766236]
AURORA is a framework to evaluate malware classifiers based on their confidence quality and operational resilience.<n>AURORA is complemented by a set of metrics designed to go beyond point-in-time performance.<n>The fragility in SOTA frameworks across datasets of varying drift suggests the need for a return to the whiteboard.
arXiv Detail & Related papers (2025-05-28T20:22:43Z)
Probabilistic Modeling of Disparity Uncertainty for Robust and Efficient Stereo Matching [61.73532883992135]
We propose a new uncertainty-aware stereo matching framework.<n>We adopt Bayes risk as the measurement of uncertainty and use it to separately estimate data and model uncertainty.
arXiv Detail & Related papers (2024-12-24T23:28:20Z)
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [68.62012304574012]
multimodal generative models have sparked critical discussions on their reliability, fairness and potential for misuse.<n>We propose an evaluation framework to assess model reliability by analyzing responses to global and local perturbations in the embedding space.<n>Our method lays the groundwork for detecting unreliable, bias-injected models and tracing the provenance of embedded biases.
arXiv Detail & Related papers (2024-11-21T09:46:55Z)
Automated Trustworthiness Testing for Machine Learning Classifiers [3.3423762257383207]
This paper proposes TOWER, the first technique to automatically create trustworthiness oracles that determine whether text classifier predictions are trustworthy. Our hypothesis is that a prediction is trustworthy if the words in its explanation are semantically related to the predicted class. The results show that TOWER can detect a decrease in trustworthiness as noise increases, but is not effective when evaluated against the human-labeled dataset.
arXiv Detail & Related papers (2024-06-07T20:25:05Z)
Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability. In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling. Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z)
Reliability-Aware Prediction via Uncertainty Learning for Person Image Retrieval [51.83967175585896]
UAL aims at providing reliability-aware predictions by considering data uncertainty and model uncertainty simultaneously. Data uncertainty captures the noise" inherent in the sample, while model uncertainty depicts the model's confidence in the sample's prediction.
arXiv Detail & Related papers (2022-10-24T17:53:20Z)
Calibrate to Interpret [0.966840768820136]
We show a first link between uncertainty and explainability, by studying the relation between calibration and interpretation. We show, in the context of networks trained on image classification tasks, to what extent interpretations are sensitive to confidence-calibration. It leads us to suggest a simple practice to improve the interpretation outcomes: Calibrate to Interpret.
arXiv Detail & Related papers (2022-07-07T14:30:52Z)
Inference for Interpretable Machine Learning: Fast, Model-Agnostic Confidence Intervals for Feature Importance [1.2891210250935146]
We develop confidence intervals for a widely-used form of machine learning interpretation: feature importance. We do so by leveraging a form of random observation and feature subsampling called minipatch ensembles. Our approach is fast as computations needed for inference come nearly for free as part of the ensemble learning process.
arXiv Detail & Related papers (2022-06-05T03:14:48Z)
On the Trustworthiness of Tree Ensemble Explainability Methods [0.9558392439655014]
Feature importance methods (e.g. gain and SHAP) are among the most popular explainability methods used to address this need. For any explainability technique to be trustworthy and meaningful, it has to provide an explanation that is accurate and stable. We evaluate the accuracy and stability of global feature importance methods through comprehensive experiments done on simulations and four real-world datasets.
arXiv Detail & Related papers (2021-09-30T20:56:37Z)
Reliable Post hoc Explanations: Modeling Uncertainty in Explainability [44.9824285459365]
Black box explanations are increasingly being employed to establish model credibility in high-stakes settings. prior work demonstrates that explanations generated by state-of-the-art techniques are inconsistent, unstable, and provide very little insight into their correctness and reliability. We develop a novel Bayesian framework for generating local explanations along with their associated uncertainty.
arXiv Detail & Related papers (2020-08-11T22:52:21Z)
The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions [93.62888099134028]
We find that the performance of state-of-the-art models on Natural Language Inference (NLI) and Reading (RC) analysis/stress sets can be highly unstable. This raises three questions: (1) How will the instability affect the reliability of the conclusions drawn based on these analysis sets? We give both theoretical explanations and empirical evidence regarding the source of the instability.
arXiv Detail & Related papers (2020-04-28T15:41:12Z)
Statistical stability indices for LIME: obtaining reliable explanations for Machine Learning models [60.67142194009297]
The ever increasing usage of Machine Learning techniques is the clearest example of such trend. It is very difficult to understand on what grounds the algorithm took the decision. It is important for the practitioner to be aware of the issue, as well as to have a tool for spotting it.
arXiv Detail & Related papers (2020-01-31T10:39:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.