Considerations When Learning Additive Explanations for Black-Box Models
- URL: http://arxiv.org/abs/1801.08640v4
- Date: Tue, 1 Aug 2023 02:35:52 GMT
- Title: Considerations When Learning Additive Explanations for Black-Box Models
- Authors: Sarah Tan, Giles Hooker, Paul Koch, Albert Gordo, Rich Caruana
- Abstract summary: We show that different explanation methods characterize non-additive components in a black-box model's prediction function in different ways.
Even though distilled explanations are generally the most accurate additive explanations, non-additive explanations such as tree explanations tend to be even more accurate.
- Score: 16.732047048577638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many methods to explain black-box models, whether local or global, are
additive. In this paper, we study global additive explanations for non-additive
models, focusing on four explanation methods: partial dependence, Shapley
explanations adapted to a global setting, distilled additive explanations, and
gradient-based explanations. We show that different explanation methods
characterize non-additive components in a black-box model's prediction function
in different ways. We use the concepts of main and total effects to anchor
additive explanations, and quantitatively evaluate additive and non-additive
explanations. Even though distilled explanations are generally the most
accurate additive explanations, non-additive explanations such as tree
explanations that explicitly model non-additive components tend to be even more
accurate. Despite this, our user study showed that machine learning
practitioners were better able to leverage additive explanations for various
tasks. These considerations should be taken into account when considering which
explanation to trust and use to explain black-box models.
Related papers
- Selective Explanations [14.312717332216073]
A machine learning model is trained to predict feature attribution scores with only one inference.
Despite their efficiency, amortized explainers can produce inaccurate predictions and misleading explanations.
We propose selective explanations, a novel feature attribution method that detects when amortized explainers generate low-quality explanations.
arXiv Detail & Related papers (2024-05-29T23:08:31Z) - How Well Do Feature-Additive Explainers Explain Feature-Additive
Predictors? [12.993027779814478]
We ask the question: can popular feature-additive explainers (e.g., LIME, SHAP, SHAPR, MAPLE, and PDP) explain feature-additive predictors?
Herein, we evaluate such explainers on ground truth that is analytically derived from the additive structure of a model.
Our results suggest that all explainers eventually fail to correctly attribute the importance of features, especially when a decision-making process involves feature interactions.
arXiv Detail & Related papers (2023-10-27T21:16:28Z) - Dissenting Explanations: Leveraging Disagreement to Reduce Model Overreliance [4.962171160815189]
We introduce the notion of dissenting explanations: conflicting predictions with accompanying explanations.
We first explore the advantage of dissenting explanations in the setting of model multiplicity.
We demonstrate that dissenting explanations reduce overreliance on model predictions, without reducing overall accuracy.
arXiv Detail & Related papers (2023-07-14T21:27:00Z) - Learning with Explanation Constraints [91.23736536228485]
We provide a learning theoretic framework to analyze how explanations can improve the learning of our models.
We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
arXiv Detail & Related papers (2023-03-25T15:06:47Z) - Learning to Scaffold: Optimizing Model Explanations for Teaching [74.25464914078826]
We train models on three natural language processing and computer vision tasks.
We find that students trained with explanations extracted with our framework are able to simulate the teacher significantly more effectively than ones produced with previous methods.
arXiv Detail & Related papers (2022-04-22T16:43:39Z) - Benchmarking and Survey of Explanation Methods for Black Box Models [9.747543620322956]
We provide a categorization of explanation methods based on the type of explanation returned.
We present the most recent and widely used explainers, and we show a visual comparison among explanations and a quantitative benchmarking.
arXiv Detail & Related papers (2021-02-25T18:50:29Z) - Evaluating Explanations: How much do explanations from the teacher aid
students? [103.05037537415811]
We formalize the value of explanations using a student-teacher paradigm that measures the extent to which explanations improve student models in learning.
Unlike many prior proposals to evaluate explanations, our approach cannot be easily gamed, enabling principled, scalable, and automatic evaluation of attributions.
arXiv Detail & Related papers (2020-12-01T23:40:21Z) - Towards Interpretable Natural Language Understanding with Explanations
as Latent Variables [146.83882632854485]
We develop a framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training.
Our framework treats natural language explanations as latent variables that model the underlying reasoning process of a neural model.
arXiv Detail & Related papers (2020-10-24T02:05:56Z) - The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z) - Model extraction from counterfactual explanations [68.8204255655161]
We show how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks.
Our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations.
arXiv Detail & Related papers (2020-09-03T19:02:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.