Which Explanation Should I Choose? A Function Approximation Perspective
to Characterizing Post hoc Explanations
- URL: http://arxiv.org/abs/2206.01254v1
- Date: Thu, 2 Jun 2022 19:09:30 GMT
- Title: Which Explanation Should I Choose? A Function Approximation Perspective
to Characterizing Post hoc Explanations
- Authors: Tessa Han, Suraj Srinivas, Himabindu Lakkaraju
- Abstract summary: We show that popular explanation methods are instances of the local function approximation (LFA) framework.
We set forth a guiding principle based on the function approximation perspective, considering a method to be effective if it recovers the underlying model.
We empirically validate our theoretical results using various real world datasets, model classes, and prediction tasks.
- Score: 16.678003262147346
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the plethora of post hoc model explanation methods, the basic
properties and behavior of these methods and the conditions under which each
one is effective are not well understood. In this work, we bridge these gaps
and address a fundamental question: Which explanation method should one use in
a given situation? To this end, we adopt a function approximation perspective
and formalize the local function approximation (LFA) framework. We show that
popular explanation methods are instances of this framework, performing
function approximations of the underlying model in different neighborhoods
using different loss functions. We introduce a no free lunch theorem for
explanation methods which demonstrates that no single method can perform
optimally across all neighbourhoods and calls for choosing among methods. To
choose among methods, we set forth a guiding principle based on the function
approximation perspective, considering a method to be effective if it recovers
the underlying model when the model is a member of the explanation function
class. Then, we analyze the conditions under which popular explanation methods
are effective and provide recommendations for choosing among explanation
methods and creating new ones. Lastly, we empirically validate our theoretical
results using various real world datasets, model classes, and prediction tasks.
By providing a principled mathematical framework which unifies diverse
explanation methods, our work characterizes the behaviour of these methods and
their relation to one another, guides the choice of explanation methods, and
paves the way for the creation of new ones.
Related papers
- An Axiomatic Approach to Model-Agnostic Concept Explanations [67.84000759813435]
We propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity.
We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings.
arXiv Detail & Related papers (2024-01-12T20:53:35Z) - Impossibility Theorems for Feature Attribution [21.88229793890961]
We show that for moderately rich model classes, any feature attribution method can provably fail to improve on random guessing for inferring model behaviour.
Our results apply to common end-tasks such as characterizing local model behaviour, identifying spurious features, and algorithmic recourse.
arXiv Detail & Related papers (2022-12-22T17:03:57Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Topological Representations of Local Explanations [8.559625821116454]
We propose a topology-based framework to extract a simplified representation from a set of local explanations.
We demonstrate that our framework can not only reliably identify differences between explainability techniques but also provides stable representations.
arXiv Detail & Related papers (2022-01-06T17:46:45Z) - Evaluating Explanations: How much do explanations from the teacher aid
students? [103.05037537415811]
We formalize the value of explanations using a student-teacher paradigm that measures the extent to which explanations improve student models in learning.
Unlike many prior proposals to evaluate explanations, our approach cannot be easily gamed, enabling principled, scalable, and automatic evaluation of attributions.
arXiv Detail & Related papers (2020-12-01T23:40:21Z) - Explaining by Removing: A Unified Framework for Model Explanation [14.50261153230204]
Removal-based explanations are based on the principle of simulating feature removal to quantify each feature's influence.
We develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence.
This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature.
arXiv Detail & Related papers (2020-11-21T00:47:48Z) - Feature Removal Is a Unifying Principle for Model Explanation Methods [14.50261153230204]
We examine the literature and find that many methods are based on a shared principle of explaining by removing.
We develop a framework for removal-based explanations that characterizes each method along three dimensions.
Our framework unifies 26 existing methods, including several of the most widely used approaches.
arXiv Detail & Related papers (2020-11-06T22:37:55Z) - Learning the Truth From Only One Side of the Story [58.65439277460011]
We focus on generalized linear models and show that without adjusting for this sampling bias, the model may converge suboptimally or even fail to converge to the optimal solution.
We propose an adaptive approach that comes with theoretical guarantees and show that it outperforms several existing methods empirically.
arXiv Detail & Related papers (2020-06-08T18:20:28Z) - Evaluating Explainable AI: Which Algorithmic Explanations Help Users
Predict Model Behavior? [97.77183117452235]
We carry out human subject tests to isolate the effect of algorithmic explanations on model interpretability.
Clear evidence of method effectiveness is found in very few cases.
Our results provide the first reliable and comprehensive estimates of how explanations influence simulatability.
arXiv Detail & Related papers (2020-05-04T20:35:17Z) - There and Back Again: Revisiting Backpropagation Saliency Methods [87.40330595283969]
Saliency methods seek to explain the predictions of a model by producing an importance map across each input sample.
A popular class of such methods is based on backpropagating a signal and analyzing the resulting gradient.
We propose a single framework under which several such methods can be unified.
arXiv Detail & Related papers (2020-04-06T17:58:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.