Related papers: How can we trust opaque systems? Criteria for robust explanations in XAI

How can we trust opaque systems? Criteria for robust explanations in XAI

URL: http://arxiv.org/abs/2508.12623v1
Date: Mon, 18 Aug 2025 04:38:55 GMT
Title: How can we trust opaque systems? Criteria for robust explanations in XAI
Authors: Florian J. Boge, Annika Schuster,
Abstract summary: Deep learning (DL) algorithms are becoming ubiquitous in everyday life and in scientific research.<n>It is unknown to laypeople and researchers alike what features of the data a DL system focuses on and how it ultimately succeeds in predicting correct outputs.<n>A necessary criterion for trustworthy explanations is that they should reflect the relevant processes the algorithms' predictions are based on.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning (DL) algorithms are becoming ubiquitous in everyday life and in scientific research. However, the price we pay for their impressively accurate predictions is significant: their inner workings are notoriously opaque - it is unknown to laypeople and researchers alike what features of the data a DL system focuses on and how it ultimately succeeds in predicting correct outputs. A necessary criterion for trustworthy explanations is that they should reflect the relevant processes the algorithms' predictions are based on. The field of eXplainable Artificial Intelligence (XAI) presents promising methods to create such explanations. But recent reviews about their performance offer reasons for skepticism. As we will argue, a good criterion for trustworthiness is explanatory robustness: different XAI methods produce the same explanations in comparable contexts. However, in some instances, all methods may give the same, but still wrong, explanation. We therefore argue that in addition to explanatory robustness (ER), a prior requirement of explanation method robustness (EMR) has to be fulfilled by every XAI method. Conversely, the robustness of an individual method is in itself insufficient for trustworthiness. In what follows, we develop and formalize criteria for ER as well as EMR, providing a framework for explaining and establishing trust in DL algorithms. We also highlight interesting application cases and outline directions for future work.

Related papers

Explaining AI Without Code: A User Study on Explainable AI [1.7966001353008778]
We present a human-centered XAI module in DashAI, an open-source no-code ML platform.<n>A user study evaluated usability and the impact of explanations on novices and experts.
arXiv Detail & Related papers (2025-12-28T15:44:43Z)
A Unified Framework for Evaluating the Effectiveness and Enhancing the Transparency of Explainable AI Methods in Real-World Applications [2.0681376988193843]
This study introduces a single evaluation framework for XAI.<n>It uses both numbers and user feedback to check if the explanations are correct, easy to understand, fair, complete, and reliable.<n>We show the value of this framework through case studies in healthcare, finance, farming, and self-driving systems.
arXiv Detail & Related papers (2024-12-05T05:30:10Z)
Explainable AI needs formal notions of explanation correctness [2.1309989863595677]
Machine learning in critical domains such as medicine poses risks and requires regulation. One requirement is that decisions of ML systems in high-risk applications should be human-understandable. In its current form, XAI is unfit to provide quality control for ML; it itself needs scrutiny.
arXiv Detail & Related papers (2024-09-22T20:47:04Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z)
Disagreement amongst counterfactual explanations: How transparency can be deceptive [0.0]
Counterfactual explanations are increasingly used as Explainable Artificial Intelligence technique. Not every algorithm creates uniform explanations for the same instance. Ethical issues arise when malicious agents use this diversity to fairwash an unfair machine learning model.
arXiv Detail & Related papers (2023-04-25T09:15:37Z)
Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles [50.81061839052459]
We formalize the generation of robust counterfactual explanations as a probabilistic problem. We show the link between the robustness of ensemble models and the robustness of base learners. Our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.
arXiv Detail & Related papers (2022-05-27T17:28:54Z)
Principled Knowledge Extrapolation with GANs [92.62635018136476]
We study counterfactual synthesis from a new perspective of knowledge extrapolation. We show that an adversarial game with a closed-form discriminator can be used to address the knowledge extrapolation problem. Our method enjoys both elegant theoretical guarantees and superior performance in many scenarios.
arXiv Detail & Related papers (2022-05-21T08:39:42Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
TrustyAI Explainability Toolkit [1.0499611180329804]
We will look at how TrustyAI can support trust in decision services and predictive models. We investigate techniques such as LIME, SHAP and counterfactuals. We also look into an extended version of SHAP, which supports background data selection to be evaluated.
arXiv Detail & Related papers (2021-04-26T17:00:32Z)
Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL) In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.