Locally Invariant Explanations: Towards Stable and Unidirectional
Explanations through Local Invariant Learning
- URL: http://arxiv.org/abs/2201.12143v2
- Date: Tue, 3 Oct 2023 13:58:09 GMT
- Title: Locally Invariant Explanations: Towards Stable and Unidirectional
Explanations through Local Invariant Learning
- Authors: Amit Dhurandhar, Karthikeyan Ramamurthy, Kartik Ahuja and Vijay Arya
- Abstract summary: We propose a model agnostic local explanation method inspired by the invariant risk minimization principle.
Our algorithm is simple and efficient to train, and can ascertain stable input features for local decisions of a black-box without access to side information.
- Score: 15.886405745163234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Locally interpretable model agnostic explanations (LIME) method is one of the
most popular methods used to explain black-box models at a per example level.
Although many variants have been proposed, few provide a simple way to produce
high fidelity explanations that are also stable and intuitive. In this work, we
provide a novel perspective by proposing a model agnostic local explanation
method inspired by the invariant risk minimization (IRM) principle --
originally proposed for (global) out-of-distribution generalization -- to
provide such high fidelity explanations that are also stable and unidirectional
across nearby examples. Our method is based on a game theoretic formulation
where we theoretically show that our approach has a strong tendency to
eliminate features where the gradient of the black-box function abruptly
changes sign in the locality of the example we want to explain, while in other
cases it is more careful and will choose a more conservative (feature)
attribution, a behavior which can be highly desirable for recourse.
Empirically, we show on tabular, image and text data that the quality of our
explanations with neighborhoods formed using random perturbations are much
better than LIME and in some cases even comparable to other methods that use
realistic neighbors sampled from the data manifold. This is desirable given
that learning a manifold to either create realistic neighbors or to project
explanations is typically expensive or may even be impossible. Moreover, our
algorithm is simple and efficient to train, and can ascertain stable input
features for local decisions of a black-box without access to side information
such as a (partial) causal graph as has been seen in some recent works.
Related papers
- GLIME: General, Stable and Local LIME Explanation [11.002828804775392]
Local Interpretable Model-agnostic Explanations (LIME) is a widely adpoted method for understanding model behaviors.
We introduce GLIME, an enhanced framework extending LIME and unifying several prior methods.
By employing a local and unbiased sampling distribution, GLIME generates explanations with higher local fidelity compared to LIME.
arXiv Detail & Related papers (2023-11-27T11:17:20Z) - Sampling Based On Natural Image Statistics Improves Local Surrogate
Explainers [111.31448606885672]
Surrogate explainers are a popular post-hoc interpretability method to further understand how a model arrives at a prediction.
We propose two approaches to do so, namely (1) altering the method for sampling the local neighbourhood and (2) using perceptual metrics to convey some of the properties of the distribution of natural images.
arXiv Detail & Related papers (2022-08-08T08:10:13Z) - An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system.
Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches.
This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Towards Better Model Understanding with Path-Sufficient Explanations [11.517059323883444]
Path-Sufficient Explanations Method (PSEM) is a sequence of sufficient explanations for a given input of strictly decreasing size.
PSEM can be thought to trace the local boundary of the model in a smooth manner, thus providing better intuition about the local model behavior for the specific input.
A user study depicts the strength of the method in communicating the local behavior, where (many) users are able to correctly determine the prediction made by a model.
arXiv Detail & Related papers (2021-09-13T16:06:10Z) - Locally Interpretable Model Agnostic Explanations using Gaussian
Processes [2.9189409618561966]
Local Interpretable Model-Agnostic Explanations (LIME) is a popular technique for explaining the prediction of a single instance.
We propose a Gaussian Process (GP) based variation of locally interpretable models.
We demonstrate that the proposed technique is able to generate faithful explanations using much fewer samples as compared to LIME.
arXiv Detail & Related papers (2021-08-16T05:49:01Z) - Evaluation of Local Model-Agnostic Explanations Using Ground Truth [4.278336455989584]
Explanation techniques are commonly evaluated using human-grounded methods.
We propose a functionally-grounded evaluation procedure for local model-agnostic explanation techniques.
arXiv Detail & Related papers (2021-06-04T13:47:31Z) - Search Methods for Sufficient, Socially-Aligned Feature Importance
Explanations with In-Distribution Counterfactuals [72.00815192668193]
Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time.
We study several under-explored dimensions of FI-based explanations, providing conceptual and empirical improvements for this form of explanation.
arXiv Detail & Related papers (2021-06-01T20:36:48Z) - Learning explanations that are hard to vary [75.30552491694066]
We show that averaging across examples can favor memorization and patchwork' solutions that sew together different strategies.
We then propose and experimentally validate a simple alternative algorithm based on a logical AND.
arXiv Detail & Related papers (2020-09-01T10:17:48Z) - Stein Variational Inference for Discrete Distributions [70.19352762933259]
We propose a simple yet general framework that transforms discrete distributions to equivalent piecewise continuous distributions.
Our method outperforms traditional algorithms such as Gibbs sampling and discontinuous Hamiltonian Monte Carlo.
We demonstrate that our method provides a promising tool for learning ensembles of binarized neural network (BNN)
In addition, such transform can be straightforwardly employed in gradient-free kernelized Stein discrepancy to perform goodness-of-fit (GOF) test on discrete distributions.
arXiv Detail & Related papers (2020-03-01T22:45:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.