A Formal Approach to Explainability
- URL: http://arxiv.org/abs/2001.05207v1
- Date: Wed, 15 Jan 2020 10:06:47 GMT
- Title: A Formal Approach to Explainability
- Authors: Lior Wolf, Tomer Galanti, Tamir Hazan
- Abstract summary: We study the links between explanation-generating functions and intermediate representations of learned models.
We study the intersection and union of explanations as a way to construct new explanations.
- Score: 100.12889473240237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We regard explanations as a blending of the input sample and the model's
output and offer a few definitions that capture various desired properties of
the function that generates these explanations. We study the links between
these properties and between explanation-generating functions and intermediate
representations of learned models and are able to show, for example, that if
the activations of a given layer are consistent with an explanation, then so do
all other subsequent layers. In addition, we study the intersection and union
of explanations as a way to construct new explanations.
Related papers
- Learning with Explanation Constraints [91.23736536228485]
We provide a learning theoretic framework to analyze how explanations can improve the learning of our models.
We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
arXiv Detail & Related papers (2023-03-25T15:06:47Z) - ELUDE: Generating interpretable explanations via a decomposition into
labelled and unlabelled features [23.384134043048807]
We develop an explanation framework that decomposes a model's prediction into two parts.
By identifying the latter, we are able to analyze the "unexplained" portion of the model.
We show that the set of unlabelled features can generalize to multiple models trained with the same feature space.
arXiv Detail & Related papers (2022-06-15T17:36:55Z) - Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics.
Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding.
We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z) - Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z) - Topological Representations of Local Explanations [8.559625821116454]
We propose a topology-based framework to extract a simplified representation from a set of local explanations.
We demonstrate that our framework can not only reliably identify differences between explainability techniques but also provides stable representations.
arXiv Detail & Related papers (2022-01-06T17:46:45Z) - Diagnostics-Guided Explanation Generation [32.97930902104502]
Explanations shed light on a machine learning model's rationales and can aid in identifying deficiencies in its reasoning process.
We show how to optimise for several diagnostic properties when training a model to generate sentence-level explanations.
arXiv Detail & Related papers (2021-09-08T16:27:52Z) - On the overlooked issue of defining explanation objectives for
local-surrogate explainers [5.094061357656677]
Local surrogate approaches for explaining machine learning model predictions have appealing properties.
Several methods exist that fit this description and share this goal.
We discuss the implications of the lack of agreement, and clarity, amongst the methods' objectives on the research and practice of explainability.
arXiv Detail & Related papers (2021-06-10T15:24:49Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z) - Sequential Explanations with Mental Model-Based Policies [20.64968620536829]
We apply a reinforcement learning framework to provide explanations based on the explainee's mental model.
We conduct novel online human experiments where explanations are selected and presented to participants.
Our results suggest that mental model-based policies may increase interpretability over multiple sequential explanations.
arXiv Detail & Related papers (2020-07-17T14:43:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.