Related papers: Locally Invariant Explanations: Towards Stable and Unidirectional Explanations through Local Invariant Learning

Locally Invariant Explanations: Towards Stable and Unidirectional Explanations through Local Invariant Learning

URL: http://arxiv.org/abs/2201.12143v2
Date: Tue, 3 Oct 2023 13:58:09 GMT
Title: Locally Invariant Explanations: Towards Stable and Unidirectional Explanations through Local Invariant Learning
Authors: Amit Dhurandhar, Karthikeyan Ramamurthy, Kartik Ahuja and Vijay Arya
Abstract summary: We propose a model agnostic local explanation method inspired by the invariant risk minimization principle. Our algorithm is simple and efficient to train, and can ascertain stable input features for local decisions of a black-box without access to side information.
Score: 15.886405745163234
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Locally interpretable model agnostic explanations (LIME) method is one of the most popular methods used to explain black-box models at a per example level. Although many variants have been proposed, few provide a simple way to produce high fidelity explanations that are also stable and intuitive. In this work, we provide a novel perspective by proposing a model agnostic local explanation method inspired by the invariant risk minimization (IRM) principle -- originally proposed for (global) out-of-distribution generalization -- to provide such high fidelity explanations that are also stable and unidirectional across nearby examples. Our method is based on a game theoretic formulation where we theoretically show that our approach has a strong tendency to eliminate features where the gradient of the black-box function abruptly changes sign in the locality of the example we want to explain, while in other cases it is more careful and will choose a more conservative (feature) attribution, a behavior which can be highly desirable for recourse. Empirically, we show on tabular, image and text data that the quality of our explanations with neighborhoods formed using random perturbations are much better than LIME and in some cases even comparable to other methods that use realistic neighbors sampled from the data manifold. This is desirable given that learning a manifold to either create realistic neighbors or to project explanations is typically expensive or may even be impossible. Moreover, our algorithm is simple and efficient to train, and can ascertain stable input features for local decisions of a black-box without access to side information such as a (partial) causal graph as has been seen in some recent works.

Related papers

CFIRE: A General Method for Combining Local Explanations [6.349229162227667]
We propose a novel eXplainable AI algorithm to compute faithful, easy-to-understand, and complete global decision rules from local explanations. Our method can be used with any local explainer that indicates which dimensions are important for a given sample for a given black-box decision.
arXiv Detail & Related papers (2025-04-01T16:04:33Z)
ExplainReduce: Summarising local explanations via proxies [2.3185929089334594]
An often-used model-agnostic approach to XAI involves using simple models as local approximations to produce so-called local explanations. This paper shows how a large set of local explanations can be reduced to a small "proxy set" of simple models, which can act as a generative global explanation.
arXiv Detail & Related papers (2025-02-14T17:14:02Z)
GLIME: General, Stable and Local LIME Explanation [11.002828804775392]
Local Interpretable Model-agnostic Explanations (LIME) is a widely adpoted method for understanding model behaviors. We introduce GLIME, an enhanced framework extending LIME and unifying several prior methods. By employing a local and unbiased sampling distribution, GLIME generates explanations with higher local fidelity compared to LIME.
arXiv Detail & Related papers (2023-11-27T11:17:20Z)
Sampling Based On Natural Image Statistics Improves Local Surrogate Explainers [111.31448606885672]
Surrogate explainers are a popular post-hoc interpretability method to further understand how a model arrives at a prediction. We propose two approaches to do so, namely (1) altering the method for sampling the local neighbourhood and (2) using perceptual metrics to convey some of the properties of the distribution of natural images.
arXiv Detail & Related papers (2022-08-08T08:10:13Z)
An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches. This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z)
MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE) In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity. Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z)
Towards Better Model Understanding with Path-Sufficient Explanations [11.517059323883444]
Path-Sufficient Explanations Method (PSEM) is a sequence of sufficient explanations for a given input of strictly decreasing size. PSEM can be thought to trace the local boundary of the model in a smooth manner, thus providing better intuition about the local model behavior for the specific input. A user study depicts the strength of the method in communicating the local behavior, where (many) users are able to correctly determine the prediction made by a model.
arXiv Detail & Related papers (2021-09-13T16:06:10Z)
Locally Interpretable Model Agnostic Explanations using Gaussian Processes [2.9189409618561966]
Local Interpretable Model-Agnostic Explanations (LIME) is a popular technique for explaining the prediction of a single instance. We propose a Gaussian Process (GP) based variation of locally interpretable models. We demonstrate that the proposed technique is able to generate faithful explanations using much fewer samples as compared to LIME.
arXiv Detail & Related papers (2021-08-16T05:49:01Z)
Evaluation of Local Model-Agnostic Explanations Using Ground Truth [4.278336455989584]
Explanation techniques are commonly evaluated using human-grounded methods. We propose a functionally-grounded evaluation procedure for local model-agnostic explanation techniques.
arXiv Detail & Related papers (2021-06-04T13:47:31Z)
Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals [72.00815192668193]
Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time. We study several under-explored dimensions of FI-based explanations, providing conceptual and empirical improvements for this form of explanation.
arXiv Detail & Related papers (2021-06-01T20:36:48Z)
Learning explanations that are hard to vary [75.30552491694066]
We show that averaging across examples can favor memorization and patchwork' solutions that sew together different strategies. We then propose and experimentally validate a simple alternative algorithm based on a logical AND.
arXiv Detail & Related papers (2020-09-01T10:17:48Z)
Stein Variational Inference for Discrete Distributions [70.19352762933259]
We propose a simple yet general framework that transforms discrete distributions to equivalent piecewise continuous distributions. Our method outperforms traditional algorithms such as Gibbs sampling and discontinuous Hamiltonian Monte Carlo. We demonstrate that our method provides a promising tool for learning ensembles of binarized neural network (BNN) In addition, such transform can be straightforwardly employed in gradient-free kernelized Stein discrepancy to perform goodness-of-fit (GOF) test on discrete distributions.
arXiv Detail & Related papers (2020-03-01T22:45:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.