Towards Transparent and Explainable Attention Models
- URL: http://arxiv.org/abs/2004.14243v1
- Date: Wed, 29 Apr 2020 14:47:50 GMT
- Title: Towards Transparent and Explainable Attention Models
- Authors: Akash Kumar Mohankumar, Preksha Nema, Sharan Narasimhan, Mitesh M.
Khapra, Balaji Vasan Srinivasan, Balaraman Ravindran
- Abstract summary: We first explain why current attention mechanisms in LSTM based encoders can neither provide a faithful nor a plausible explanation of the model's predictions.
We propose a modified LSTM cell with a diversity-driven training objective that ensures that the hidden representations learned at different time steps are diverse.
Human evaluations indicate that the attention distributions learned by our model offer a plausible explanation of the model's predictions.
- Score: 34.0557018891191
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies on interpretability of attention distributions have led to
notions of faithful and plausible explanations for a model's predictions.
Attention distributions can be considered a faithful explanation if a higher
attention weight implies a greater impact on the model's prediction. They can
be considered a plausible explanation if they provide a human-understandable
justification for the model's predictions. In this work, we first explain why
current attention mechanisms in LSTM based encoders can neither provide a
faithful nor a plausible explanation of the model's predictions. We observe
that in LSTM based encoders the hidden representations at different time-steps
are very similar to each other (high conicity) and attention weights in these
situations do not carry much meaning because even a random permutation of the
attention weights does not affect the model's predictions. Based on experiments
on a wide variety of tasks and datasets, we observe attention distributions
often attribute the model's predictions to unimportant words such as
punctuation and fail to offer a plausible explanation for the predictions. To
make attention mechanisms more faithful and plausible, we propose a modified
LSTM cell with a diversity-driven training objective that ensures that the
hidden representations learned at different time steps are diverse. We show
that the resulting attention distributions offer more transparency as they (i)
provide a more precise importance ranking of the hidden states (ii) are better
indicative of words important for the model's predictions (iii) correlate
better with gradient-based attribution methods. Human evaluations indicate that
the attention distributions learned by our model offer a plausible explanation
of the model's predictions. Our code has been made publicly available at
https://github.com/akashkm99/Interpretable-Attention
Related papers
- Towards Generalizable and Interpretable Motion Prediction: A Deep
Variational Bayes Approach [54.429396802848224]
This paper proposes an interpretable generative model for motion prediction with robust generalizability to out-of-distribution cases.
For interpretability, the model achieves the target-driven motion prediction by estimating the spatial distribution of long-term destinations.
Experiments on motion prediction datasets validate that the fitted model can be interpretable and generalizable.
arXiv Detail & Related papers (2024-03-10T04:16:04Z) - Analyzing Explainer Robustness via Probabilistic Lipschitzness of Prediction Functions [3.5199856477763722]
We focus on one particular aspect of robustness, namely that an explainer should give similar explanations for similar data inputs.
We formalize this notion by introducing and defining explainer astuteness, analogous to astuteness of prediction functions.
arXiv Detail & Related papers (2022-06-24T19:43:33Z) - VisFIS: Visual Feature Importance Supervision with
Right-for-the-Right-Reason Objectives [84.48039784446166]
We show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason metrics.
Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets.
Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful.
arXiv Detail & Related papers (2022-06-22T17:02:01Z) - Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels.
Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features.
These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z) - Counterfactual Explanations for Predictive Business Process Monitoring [0.90238471756546]
We propose LORELEY, a counterfactual explanation technique for predictive process monitoring.
LORELEY can approximate prediction models with an average fidelity of 97.69% and generate realistic counterfactual explanations.
arXiv Detail & Related papers (2022-02-24T11:01:20Z) - Deconfounding to Explanation Evaluation in Graph Neural Networks [136.73451468551656]
We argue that a distribution shift exists between the full graph and the subgraph, causing the out-of-distribution problem.
We propose Deconfounded Subgraph Evaluation (DSE) which assesses the causal effect of an explanatory subgraph on the model prediction.
arXiv Detail & Related papers (2022-01-21T18:05:00Z) - Is Sparse Attention more Interpretable? [52.85910570651047]
We investigate how sparsity affects our ability to use attention as an explainability tool.
We find that only a weak relationship between inputs and co-indexed intermediate representations exists -- under sparse attention.
We observe in this setting that inducing sparsity may make it less plausible that attention can be used as a tool for understanding model behavior.
arXiv Detail & Related papers (2021-06-02T11:42:56Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Explainable Artificial Intelligence: How Subsets of the Training Data
Affect a Prediction [2.3204178451683264]
We propose a novel methodology which we call Shapley values for training data subset importance.
We show how the proposed explanations can be used to reveal biasedness in models and erroneous training data.
We argue that the explanations enable us to perceive more of the inner workings of the algorithms, and illustrate how models producing similar predictions can be based on very different parts of the training data.
arXiv Detail & Related papers (2020-12-07T12:15:47Z) - Explaining the Behavior of Black-Box Prediction Algorithms with Causal
Learning [9.279259759707996]
Causal approaches to post-hoc explainability for black-box prediction models have become increasingly popular.
We learn causal graphical representations that allow for arbitrary unmeasured confounding among features.
Our approach is motivated by a counterfactual theory of causal explanation wherein good explanations point to factors that are "difference-makers" in an interventionist sense.
arXiv Detail & Related papers (2020-06-03T19:02:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.