LIPEx-Locally Interpretable Probabilistic Explanations-To Look Beyond
The True Class
- URL: http://arxiv.org/abs/2310.04856v2
- Date: Thu, 7 Dec 2023 10:02:06 GMT
- Title: LIPEx-Locally Interpretable Probabilistic Explanations-To Look Beyond
The True Class
- Authors: Hongbo Zhu, Angelo Cangelosi, Procheta Sen and Anirbit Mukherjee
- Abstract summary: LIPEx is a perturbation-based multi-class explanation framework.
It provides insight into how every feature deemed to be important affects the prediction probability for each of the possible classes.
- Score: 17.12486200215929
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we instantiate a novel perturbation-based multi-class
explanation framework, LIPEx (Locally Interpretable Probabilistic Explanation).
We demonstrate that LIPEx not only locally replicates the probability
distributions output by the widely used complex classification models but also
provides insight into how every feature deemed to be important affects the
prediction probability for each of the possible classes. We achieve this by
defining the explanation as a matrix obtained via regression with respect to
the Hellinger distance in the space of probability distributions. Ablation
tests on text and image data, show that LIPEx-guided removal of important
features from the data causes more change in predictions for the underlying
model than similar tests based on other saliency-based or feature
importance-based Explainable AI (XAI) methods. It is also shown that compared
to LIME, LIPEx is more data efficient in terms of using a lesser number of
perturbations of the data to obtain a reliable explanation. This
data-efficiency is seen to manifest as LIPEx being able to compute its
explanation matrix around 53% faster than all-class LIME, for classification
experiments with text data.
Related papers
- DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets.
Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining.
Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z) - Structural Entropy Guided Probabilistic Coding [52.01765333755793]
We propose a novel structural entropy-guided probabilistic coding model, named SEPC.
We incorporate the relationship between latent variables into the optimization by proposing a structural entropy regularization loss.
Experimental results across 12 natural language understanding tasks, including both classification and regression tasks, demonstrate the superior performance of SEPC.
arXiv Detail & Related papers (2024-12-12T00:37:53Z) - Graph-based Complexity for Causal Effect by Empirical Plug-in [56.14597641617531]
This paper focuses on the computational complexity of computing empirical plug-in estimates for causal effect queries.
We show that computation can be done efficiently, potentially in time linear in the data size, depending on the estimand's hypergraph.
arXiv Detail & Related papers (2024-11-15T07:42:01Z) - Estimating Causal Effects from Learned Causal Networks [56.14597641617531]
We propose an alternative paradigm for answering causal-effect queries over discrete observable variables.
We learn the causal Bayesian network and its confounding latent variables directly from the observational data.
We show that this emphmodel completion learning approach can be more effective than estimand approaches.
arXiv Detail & Related papers (2024-08-26T08:39:09Z) - Collaborative Learning with Different Labeling Functions [7.228285747845779]
We study a variant of Collaborative PAC Learning, in which we aim to learn an accurate classifier for each of the $n$ data distributions.
We show that, when the data distributions satisfy a weaker realizability assumption, sample-efficient learning is still feasible.
arXiv Detail & Related papers (2024-02-16T04:32:22Z) - Estimation of embedding vectors in high dimensions [10.55292041492388]
We consider a simple probability model for discrete data where there is some "true" but unknown embedding.
Under this model, it is shown that the embeddings can be learned by a variant of low-rank approximate message passing (AMP) method.
Our theoretical findings are validated by simulations on both synthetic data and real text data.
arXiv Detail & Related papers (2023-12-12T23:41:59Z) - Supervised Feature Compression based on Counterfactual Analysis [3.2458225810390284]
This work aims to leverage Counterfactual Explanations to detect the important decision boundaries of a pre-trained black-box model.
Using the discretized dataset, an optimal Decision Tree can be trained that resembles the black-box model, but that is interpretable and compact.
arXiv Detail & Related papers (2022-11-17T21:16:14Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - Analyzing Explainer Robustness via Probabilistic Lipschitzness of Prediction Functions [3.5199856477763722]
We focus on one particular aspect of robustness, namely that an explainer should give similar explanations for similar data inputs.
We formalize this notion by introducing and defining explainer astuteness, analogous to astuteness of prediction functions.
arXiv Detail & Related papers (2022-06-24T19:43:33Z) - Pseudo-Spherical Contrastive Divergence [119.28384561517292]
We propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum learning likelihood of energy-based models.
PS-CD avoids the intractable partition function and provides a generalized family of learning objectives.
arXiv Detail & Related papers (2021-11-01T09:17:15Z) - On the Tractability of SHAP Explanations [40.829629145230356]
SHAP explanations are a popular feature-attribution mechanism for explainable AI.
We show that the complexity of computing the SHAP explanation is the same as the complexity of computing the expected value of the model.
Going beyond fully-factorized distributions, we show that computing SHAP explanations is already intractable for a very simple setting.
arXiv Detail & Related papers (2020-09-18T05:48:15Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.