Explaining by Removing: A Unified Framework for Model Explanation
- URL: http://arxiv.org/abs/2011.14878v2
- Date: Fri, 13 May 2022 03:43:44 GMT
- Title: Explaining by Removing: A Unified Framework for Model Explanation
- Authors: Ian Covert, Scott Lundberg, Su-In Lee
- Abstract summary: Removal-based explanations are based on the principle of simulating feature removal to quantify each feature's influence.
We develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence.
This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature.
- Score: 14.50261153230204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Researchers have proposed a wide variety of model explanation approaches, but
it remains unclear how most methods are related or when one method is
preferable to another. We describe a new unified class of methods,
removal-based explanations, that are based on the principle of simulating
feature removal to quantify each feature's influence. These methods vary in
several respects, so we develop a framework that characterizes each method
along three dimensions: 1) how the method removes features, 2) what model
behavior the method explains, and 3) how the method summarizes each feature's
influence. Our framework unifies 26 existing methods, including several of the
most widely used approaches: SHAP, LIME, Meaningful Perturbations, and
permutation tests. This newly understood class of explanation methods has rich
connections that we examine using tools that have been largely overlooked by
the explainability literature. To anchor removal-based explanations in
cognitive psychology, we show that feature removal is a simple application of
subtractive counterfactual reasoning. Ideas from cooperative game theory shed
light on the relationships and trade-offs among different methods, and we
derive conditions under which all removal-based explanations have
information-theoretic interpretations. Through this analysis, we develop a
unified framework that helps practitioners better understand model explanation
tools, and that offers a strong theoretical foundation upon which future
explainability research can build.
Related papers
- An Axiomatic Approach to Model-Agnostic Concept Explanations [67.84000759813435]
We propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity.
We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings.
arXiv Detail & Related papers (2024-01-12T20:53:35Z) - Evaluating the Robustness of Interpretability Methods through
Explanation Invariance and Equivariance [72.50214227616728]
Interpretability methods are valuable only if their explanations faithfully describe the explained model.
We consider neural networks whose predictions are invariant under a specific symmetry group.
arXiv Detail & Related papers (2023-04-13T17:59:03Z) - Which Explanation Should I Choose? A Function Approximation Perspective
to Characterizing Post hoc Explanations [16.678003262147346]
We show that popular explanation methods are instances of the local function approximation (LFA) framework.
We set forth a guiding principle based on the function approximation perspective, considering a method to be effective if it recovers the underlying model.
We empirically validate our theoretical results using various real world datasets, model classes, and prediction tasks.
arXiv Detail & Related papers (2022-06-02T19:09:30Z) - Topological Representations of Local Explanations [8.559625821116454]
We propose a topology-based framework to extract a simplified representation from a set of local explanations.
We demonstrate that our framework can not only reliably identify differences between explainability techniques but also provides stable representations.
arXiv Detail & Related papers (2022-01-06T17:46:45Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - Explaining Natural Language Processing Classifiers with Occlusion and
Language Modeling [4.9342793303029975]
We present a novel explanation method, called OLM, for natural language processing classifiers.
OLM gives explanations that are theoretically sound and easy to understand.
We make several contributions to the theory of explanation methods.
arXiv Detail & Related papers (2021-01-28T09:44:04Z) - Evaluating Explanations: How much do explanations from the teacher aid
students? [103.05037537415811]
We formalize the value of explanations using a student-teacher paradigm that measures the extent to which explanations improve student models in learning.
Unlike many prior proposals to evaluate explanations, our approach cannot be easily gamed, enabling principled, scalable, and automatic evaluation of attributions.
arXiv Detail & Related papers (2020-12-01T23:40:21Z) - Feature Removal Is a Unifying Principle for Model Explanation Methods [14.50261153230204]
We examine the literature and find that many methods are based on a shared principle of explaining by removing.
We develop a framework for removal-based explanations that characterizes each method along three dimensions.
Our framework unifies 26 existing methods, including several of the most widely used approaches.
arXiv Detail & Related papers (2020-11-06T22:37:55Z) - Evaluating Explainable AI: Which Algorithmic Explanations Help Users
Predict Model Behavior? [97.77183117452235]
We carry out human subject tests to isolate the effect of algorithmic explanations on model interpretability.
Clear evidence of method effectiveness is found in very few cases.
Our results provide the first reliable and comprehensive estimates of how explanations influence simulatability.
arXiv Detail & Related papers (2020-05-04T20:35:17Z) - There and Back Again: Revisiting Backpropagation Saliency Methods [87.40330595283969]
Saliency methods seek to explain the predictions of a model by producing an importance map across each input sample.
A popular class of such methods is based on backpropagating a signal and analyzing the resulting gradient.
We propose a single framework under which several such methods can be unified.
arXiv Detail & Related papers (2020-04-06T17:58:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.