Latent SHAP: Toward Practical Human-Interpretable Explanations
- URL: http://arxiv.org/abs/2211.14797v1
- Date: Sun, 27 Nov 2022 11:33:26 GMT
- Title: Latent SHAP: Toward Practical Human-Interpretable Explanations
- Authors: Ron Bitton, Alon Malach, Amiel Meiseles, Satoru Momiyama, Toshinori
Araki, Jun Furukawa, Yuval Elovici and Asaf Shabtai
- Abstract summary: We introduce Latent SHAP, a black-box feature attribution framework that provides human-interpretable explanations.
We demonstrate Latent SHAP's effectiveness using (1) a controlled experiment where invertible transformation functions are available, which enables robust quantitative evaluation of our method, and (2) celebrity attractiveness classification (using the CelebA dataset) where invertible transformation functions are not available.
- Score: 44.28376542666342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model agnostic feature attribution algorithms (such as SHAP and LIME) are
ubiquitous techniques for explaining the decisions of complex classification
models, such as deep neural networks. However, since complex classification
models produce superior performance when trained on low-level (or encoded)
features, in many cases, the explanations generated by these algorithms are
neither interpretable nor usable by humans. Methods proposed in recent studies
that support the generation of human-interpretable explanations are
impractical, because they require a fully invertible transformation function
that maps the model's input features to the human-interpretable features. In
this work, we introduce Latent SHAP, a black-box feature attribution framework
that provides human-interpretable explanations, without the requirement for a
fully invertible transformation function. We demonstrate Latent SHAP's
effectiveness using (1) a controlled experiment where invertible transformation
functions are available, which enables robust quantitative evaluation of our
method, and (2) celebrity attractiveness classification (using the CelebA
dataset) where invertible transformation functions are not available, which
enables thorough qualitative evaluation of our method.
Related papers
- Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales [3.242050660144211]
Saliency post-hoc explainability methods are important tools for understanding increasingly complex NLP models.
We present a methodology for incorporating rationales, which are text annotations explaining human decisions, into text classification models.
arXiv Detail & Related papers (2024-04-03T22:39:33Z) - FIND: A Function Description Benchmark for Evaluating Interpretability
Methods [86.80718559904854]
This paper introduces FIND (Function INterpretation and Description), a benchmark suite for evaluating automated interpretability methods.
FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate.
We evaluate methods that use pretrained language models to produce descriptions of function behavior in natural language and code.
arXiv Detail & Related papers (2023-09-07T17:47:26Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - Koopman operator learning using invertible neural networks [0.6846628460229516]
In Koopman operator theory, a finite-dimensional nonlinear system is transformed into an infinite but linear system using a set of observable functions.
Current methodologies tend to disregard the importance of the invertibility of observable functions, which leads to inaccurate results.
We propose FlowDMD, aka Flow-based Dynamic Mode Decomposition, that utilizes the Coupling Flow Invertible Neural Network (CF-INN) framework.
arXiv Detail & Related papers (2023-06-30T04:26:46Z) - ContraFeat: Contrasting Deep Features for Semantic Discovery [102.4163768995288]
StyleGAN has shown strong potential for disentangled semantic control.
Existing semantic discovery methods on StyleGAN rely on manual selection of modified latent layers to obtain satisfactory manipulation results.
We propose a model that automates this process and achieves state-of-the-art semantic discovery performance.
arXiv Detail & Related papers (2022-12-14T15:22:13Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Human-interpretable model explainability on high-dimensional data [8.574682463936007]
We introduce a framework for human-interpretable explainability on high-dimensional data, consisting of two modules.
First, we apply a semantically meaningful latent representation, both to reduce the raw dimensionality of the data, and to ensure its human interpretability.
Second, we adapt the Shapley paradigm for model-agnostic explainability to operate on these latent features. This leads to interpretable model explanations that are both theoretically controlled and computationally tractable.
arXiv Detail & Related papers (2020-10-14T20:06:28Z) - Invariant Feature Coding using Tensor Product Representation [75.62232699377877]
We prove that the group-invariant feature vector contains sufficient discriminative information when learning a linear classifier.
A novel feature model that explicitly consider group action is proposed for principal component analysis and k-means clustering.
arXiv Detail & Related papers (2019-06-05T07:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.