The Explanation Game: Towards Prediction Explainability through Sparse
Communication
- URL: http://arxiv.org/abs/2004.13876v2
- Date: Mon, 12 Oct 2020 08:05:13 GMT
- Title: The Explanation Game: Towards Prediction Explainability through Sparse
Communication
- Authors: Marcos V. Treviso and Andr\'e F. T. Martins
- Abstract summary: We provide a unified perspective of explainability as a problem between an explainer and a layperson.
We use this framework to compare several prior approaches for extracting explanations.
We propose new embedded methods for explainability, through the use of selective, sparse attention.
- Score: 6.497816402045099
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explainability is a topic of growing importance in NLP. In this work, we
provide a unified perspective of explainability as a communication problem
between an explainer and a layperson about a classifier's decision. We use this
framework to compare several prior approaches for extracting explanations,
including gradient methods, representation erasure, and attention mechanisms,
in terms of their communication success. In addition, we reinterpret these
methods at the light of classical feature selection, and we use this as
inspiration to propose new embedded methods for explainability, through the use
of selective, sparse attention. Experiments in text classification, natural
language entailment, and machine translation, using different configurations of
explainers and laypeople (including both machines and humans), reveal an
advantage of attention-based explainers over gradient and erasure methods.
Furthermore, human evaluation experiments show promising results with post-hoc
explainers trained to optimize communication success and faithfulness.
Related papers
- An AI Architecture with the Capability to Explain Recognition Results [0.0]
This research focuses on the importance of metrics to explainability and contributes two methods yielding performance gains.
The first method introduces a combination of explainable and unexplainable flows, proposing a metric to characterize explainability of a decision.
The second method compares classic metrics for estimating the effectiveness of neural networks in the system, posing a new metric as the leading performer.
arXiv Detail & Related papers (2024-06-13T02:00:13Z) - Unsupervised Interpretable Basis Extraction for Concept-Based Visual
Explanations [53.973055975918655]
We show that, intermediate layer representations become more interpretable when transformed to the bases extracted with our method.
We compare the bases extracted with our method with the bases derived with a supervised approach and find that, in one aspect, the proposed unsupervised approach has a strength that constitutes a limitation of the supervised one and give potential directions for future research.
arXiv Detail & Related papers (2023-03-19T00:37:19Z) - A survey on improving NLP models with human explanations [10.14196008734383]
Training a model with access to human explanations can improve data efficiency and model performance on in- and out-of-domain data.
Similarity with the process of human learning makes learning from explanations a promising way to establish a fruitful human-machine interaction.
arXiv Detail & Related papers (2022-04-19T13:43:31Z) - Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - On Sample Based Explanation Methods for NLP:Efficiency, Faithfulness,
and Semantic Evaluation [23.72825603188359]
We can improve the interpretability of explanations by allowing arbitrary text sequences as the explanation unit.
We propose a semantic-based evaluation metric that can better align with humans' judgment of explanations.
arXiv Detail & Related papers (2021-06-09T00:49:56Z) - This is not the Texture you are looking for! Introducing Novel
Counterfactual Explanations for Non-Experts using Generative Adversarial
Learning [59.17685450892182]
counterfactual explanation systems try to enable a counterfactual reasoning by modifying the input image.
We present a novel approach to generate such counterfactual image explanations based on adversarial image-to-image translation techniques.
Our results show that our approach leads to significantly better results regarding mental models, explanation satisfaction, trust, emotions, and self-efficacy than two state-of-the art systems.
arXiv Detail & Related papers (2020-12-22T10:08:05Z) - Explaining Black Box Predictions and Unveiling Data Artifacts through
Influence Functions [55.660255727031725]
Influence functions explain the decisions of a model by identifying influential training examples.
We conduct a comparison between influence functions and common word-saliency methods on representative tasks.
We develop a new measure based on influence functions that can reveal artifacts in training data.
arXiv Detail & Related papers (2020-05-14T00:45:23Z) - Generating Hierarchical Explanations on Text Classification via Feature
Interaction Detection [21.02924712220406]
We build hierarchical explanations by detecting feature interactions.
Such explanations visualize how words and phrases are combined at different levels of the hierarchy.
Experiments show the effectiveness of the proposed method in providing explanations both faithful to models and interpretable to humans.
arXiv Detail & Related papers (2020-04-04T20:56:37Z) - Explain and Improve: LRP-Inference Fine-Tuning for Image Captioning
Models [82.3793660091354]
This paper analyzes the predictions of image captioning models with attention mechanisms beyond visualizing the attention itself.
We develop variants of layer-wise relevance propagation (LRP) and gradient-based explanation methods, tailored to image captioning models with attention mechanisms.
arXiv Detail & Related papers (2020-01-04T05:15:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.