Related papers: Explaining by Removing: A Unified Framework for Model Explanation

Explaining by Removing: A Unified Framework for Model Explanation

URL: http://arxiv.org/abs/2011.14878v2
Date: Fri, 13 May 2022 03:43:44 GMT
Title: Explaining by Removing: A Unified Framework for Model Explanation
Authors: Ian Covert, Scott Lundberg, Su-In Lee
Abstract summary: Removal-based explanations are based on the principle of simulating feature removal to quantify each feature's influence. We develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature.
Score: 14.50261153230204
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We describe a new unified class of methods, removal-based explanations, that are based on the principle of simulating feature removal to quantify each feature's influence. These methods vary in several respects, so we develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. Our framework unifies 26 existing methods, including several of the most widely used approaches: SHAP, LIME, Meaningful Perturbations, and permutation tests. This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature. To anchor removal-based explanations in cognitive psychology, we show that feature removal is a simple application of subtractive counterfactual reasoning. Ideas from cooperative game theory shed light on the relationships and trade-offs among different methods, and we derive conditions under which all removal-based explanations have information-theoretic interpretations. Through this analysis, we develop a unified framework that helps practitioners better understand model explanation tools, and that offers a strong theoretical foundation upon which future explainability research can build.

Related papers

How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations [69.72654127617058]
Post-hoc importance attribution methods are a popular tool for "explaining" Deep Neural Networks (DNNs) In this work we bring forward empirical evidence that challenges this very notion. We discover a strong dependency on and demonstrate that the training details of a pre-trained model's classification layer play a crucial role.
arXiv Detail & Related papers (2025-03-01T22:25:11Z)
Unifying Attribution-Based Explanations Using Functional Decomposition [1.8216507818880976]
We propose a unifying framework of attribution-based explanation methods. It provides a step towards a rigorous study of the similarities and differences of explanations.
arXiv Detail & Related papers (2024-12-18T09:04:07Z)
An Axiomatic Approach to Model-Agnostic Concept Explanations [67.84000759813435]
We propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity. We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings.
arXiv Detail & Related papers (2024-01-12T20:53:35Z)
Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance [72.50214227616728]
Interpretability methods are valuable only if their explanations faithfully describe the explained model. We consider neural networks whose predictions are invariant under a specific symmetry group.
arXiv Detail & Related papers (2023-04-13T17:59:03Z)
Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post hoc Explanations [16.678003262147346]
We show that popular explanation methods are instances of the local function approximation (LFA) framework. We set forth a guiding principle based on the function approximation perspective, considering a method to be effective if it recovers the underlying model. We empirically validate our theoretical results using various real world datasets, model classes, and prediction tasks.
arXiv Detail & Related papers (2022-06-02T19:09:30Z)
Topological Representations of Local Explanations [8.559625821116454]
We propose a topology-based framework to extract a simplified representation from a set of local explanations. We demonstrate that our framework can not only reliably identify differences between explainability techniques but also provides stable representations.
arXiv Detail & Related papers (2022-01-06T17:46:45Z)
Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models. Our method is based on projecting model representation to a latent space. Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z)
Explaining Natural Language Processing Classifiers with Occlusion and Language Modeling [4.9342793303029975]
We present a novel explanation method, called OLM, for natural language processing classifiers. OLM gives explanations that are theoretically sound and easy to understand. We make several contributions to the theory of explanation methods.
arXiv Detail & Related papers (2021-01-28T09:44:04Z)
Evaluating Explanations: How much do explanations from the teacher aid students? [103.05037537415811]
We formalize the value of explanations using a student-teacher paradigm that measures the extent to which explanations improve student models in learning. Unlike many prior proposals to evaluate explanations, our approach cannot be easily gamed, enabling principled, scalable, and automatic evaluation of attributions.
arXiv Detail & Related papers (2020-12-01T23:40:21Z)
Feature Removal Is a Unifying Principle for Model Explanation Methods [14.50261153230204]
We examine the literature and find that many methods are based on a shared principle of explaining by removing. We develop a framework for removal-based explanations that characterizes each method along three dimensions. Our framework unifies 26 existing methods, including several of the most widely used approaches.
arXiv Detail & Related papers (2020-11-06T22:37:55Z)
Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? [97.77183117452235]
We carry out human subject tests to isolate the effect of algorithmic explanations on model interpretability. Clear evidence of method effectiveness is found in very few cases. Our results provide the first reliable and comprehensive estimates of how explanations influence simulatability.
arXiv Detail & Related papers (2020-05-04T20:35:17Z)
There and Back Again: Revisiting Backpropagation Saliency Methods [87.40330595283969]
Saliency methods seek to explain the predictions of a model by producing an importance map across each input sample. A popular class of such methods is based on backpropagating a signal and analyzing the resulting gradient. We propose a single framework under which several such methods can be unified.
arXiv Detail & Related papers (2020-04-06T17:58:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.