Uncertainty Quantification of Surrogate Explanations: an Ordinal
Consensus Approach
- URL: http://arxiv.org/abs/2111.09121v1
- Date: Wed, 17 Nov 2021 13:55:58 GMT
- Title: Uncertainty Quantification of Surrogate Explanations: an Ordinal
Consensus Approach
- Authors: Jonas Schulz, Rafael Poyiadzi, Raul Santos-Rodriguez
- Abstract summary: We produce estimates of the uncertainty of a given explanation by measuring the consensus amongst a set of diverse bootstrapped surrogate explainers.
We empirically illustrate the properties of this approach through experiments on state-of-the-art Convolutional Neural Network ensembles.
- Score: 1.3750624267664155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explainability of black-box machine learning models is crucial, in particular
when deployed in critical applications such as medicine or autonomous cars.
Existing approaches produce explanations for the predictions of models,
however, how to assess the quality and reliability of such explanations remains
an open question. In this paper we take a step further in order to provide the
practitioner with tools to judge the trustworthiness of an explanation. To this
end, we produce estimates of the uncertainty of a given explanation by
measuring the ordinal consensus amongst a set of diverse bootstrapped surrogate
explainers. While we encourage diversity by using ensemble techniques, we
propose and analyse metrics to aggregate the information contained within the
set of explainers through a rating scheme. We empirically illustrate the
properties of this approach through experiments on state-of-the-art
Convolutional Neural Network ensembles. Furthermore, through tailored
visualisations, we show specific examples of situations where uncertainty
estimates offer concrete actionable insights to the user beyond those arising
from standard surrogate explainers.
Related papers
- Towards a Unified Framework for Evaluating Explanations [0.6138671548064356]
We argue that explanations serve as mediators between models and stakeholders, whether for intrinsically interpretable models or opaque black-box models.
We illustrate these criteria, as well as specific evaluation methods, using examples from an ongoing study of an interpretable neural network for predicting a particular learner behavior.
arXiv Detail & Related papers (2024-05-22T21:49:28Z) - Estimation of Concept Explanations Should be Uncertainty Aware [39.598213804572396]
We study a specific kind called Concept Explanations, where the goal is to interpret a model using human-understandable concepts.
Although popular for their easy interpretation, concept explanations are known to be noisy.
We propose an uncertainty-aware Bayesian estimation method to address these issues, which readily improved the quality of explanations.
arXiv Detail & Related papers (2023-12-13T11:17:27Z) - Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - Counterfactuals of Counterfactuals: a back-translation-inspired approach
to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations.
We propose a new back translation-inspired evaluation methodology.
We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z) - An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system.
Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches.
This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z) - Explainability in Process Outcome Prediction: Guidelines to Obtain
Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction.
This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z) - Framework for Evaluating Faithfulness of Local Explanations [21.648639081403754]
We study the faithfulness of an explanation system to the underlying prediction model.
For a variety of existing explanation systems, such as anchors, we analytically study these quantities.
We provide estimators and sample complexity bounds for empirically determining the faithfulness of black-box explanation systems.
arXiv Detail & Related papers (2022-02-01T20:14:06Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis.
We obtain new explanations that are loosely necessary and sufficient for a prediction.
We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.