The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets
- URL: http://arxiv.org/abs/2009.11023v2
- Date: Mon, 14 Dec 2020 13:46:26 GMT
- Title: The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets
- Authors: Oana-Maria Camburu, Eleonora Giunchiglia, Jakob Foerster, Thomas
Lukasiewicz, Phil Blunsom
- Abstract summary: We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
- Score: 61.66584140190247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For neural models to garner widespread public trust and ensure fairness, we
must have human-intelligible explanations for their predictions. Recently, an
increasing number of works focus on explaining the predictions of neural models
in terms of the relevance of the input features. In this work, we show that
feature-based explanations pose problems even for explaining trivial models. We
show that, in certain cases, there exist at least two ground-truth
feature-based explanations, and that, sometimes, neither of them is enough to
provide a complete view of the decision-making process of the model. Moreover,
we show that two popular classes of explainers, Shapley explainers and minimal
sufficient subsets explainers, target fundamentally different types of
ground-truth explanations, despite the apparently implicit assumption that
explainers should look for one specific feature-based explanation. These
findings bring an additional dimension to consider in both developing and
choosing explainers.
Related papers
- Sufficient and Necessary Explanations (and What Lies in Between) [6.9035001722324685]
We study two precise notions of feature importance for general machine learning models: sufficiency and necessity.
We propose a unified notion of importance that circumvents these limitations by exploring a continuum along a necessity-sufficiency axis.
arXiv Detail & Related papers (2024-09-30T15:50:57Z) - Hard to Explain: On the Computational Hardness of In-Distribution Model Interpretation [0.9558392439655016]
The ability to interpret Machine Learning (ML) models is becoming increasingly essential.
Recent work has demonstrated that it is possible to formally assess interpretability by studying the computational complexity of explaining the decisions of various models.
arXiv Detail & Related papers (2024-08-07T17:20:52Z) - Dissenting Explanations: Leveraging Disagreement to Reduce Model Overreliance [4.962171160815189]
We introduce the notion of dissenting explanations: conflicting predictions with accompanying explanations.
We first explore the advantage of dissenting explanations in the setting of model multiplicity.
We demonstrate that dissenting explanations reduce overreliance on model predictions, without reducing overall accuracy.
arXiv Detail & Related papers (2023-07-14T21:27:00Z) - Learning with Explanation Constraints [91.23736536228485]
We provide a learning theoretic framework to analyze how explanations can improve the learning of our models.
We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
arXiv Detail & Related papers (2023-03-25T15:06:47Z) - ExSum: From Local Explanations to Model Understanding [6.23934576145261]
Interpretability methods are developed to understand the working mechanisms of black-box models.
Fulfilling this goal requires both that the explanations generated by these methods are correct and that people can easily and reliably understand them.
We introduce explanation summary (ExSum), a mathematical framework for quantifying model understanding.
arXiv Detail & Related papers (2022-04-30T02:07:20Z) - Do not explain without context: addressing the blind spot of model
explanations [2.280298858971133]
This paper highlights a blind spot which is often overlooked when monitoring and auditing machine learning models.
We discuss that many model explanations depend directly or indirectly on the choice of the referenced data distribution.
We showcase examples where small changes in the distribution lead to drastic changes in the explanations, such as a change in trend or, alarmingly, a conclusion.
arXiv Detail & Related papers (2021-05-28T12:48:40Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - Are Visual Explanations Useful? A Case Study in Model-in-the-Loop
Prediction [49.254162397086006]
We study explanations based on visual saliency in an image-based age prediction task.
We find that presenting model predictions improves human accuracy.
However, explanations of various kinds fail to significantly alter human accuracy or trust in the model.
arXiv Detail & Related papers (2020-07-23T20:39:40Z) - Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis.
We obtain new explanations that are loosely necessary and sufficient for a prediction.
We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.