On the Interpretability of Attention Networks
- URL: http://arxiv.org/abs/2212.14776v3
- Date: Sun, 14 May 2023 04:12:01 GMT
- Title: On the Interpretability of Attention Networks
- Authors: Lakshmi Narayan Pandey, Rahul Vashisht and Harish G. Ramaswamy
- Abstract summary: We show how an attention model can be accurate but fail to be interpretable, and show that such models do occur as a result of training.
We evaluate a few attention model learning algorithms designed to encourage sparsity and demonstrate that these algorithms help improve interpretability.
- Score: 1.299941371793082
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Attention mechanisms form a core component of several successful deep
learning architectures, and are based on one key idea: ''The output depends
only on a small (but unknown) segment of the input.'' In several practical
applications like image captioning and language translation, this is mostly
true. In trained models with an attention mechanism, the outputs of an
intermediate module that encodes the segment of input responsible for the
output is often used as a way to peek into the `reasoning` of the network. We
make such a notion more precise for a variant of the classification problem
that we term selective dependence classification (SDC) when used with attention
model architectures. Under such a setting, we demonstrate various error modes
where an attention model can be accurate but fail to be interpretable, and show
that such models do occur as a result of training. We illustrate various
situations that can accentuate and mitigate this behaviour. Finally, we use our
objective definition of interpretability for SDC tasks to evaluate a few
attention model learning algorithms designed to encourage sparsity and
demonstrate that these algorithms help improve interpretability.
Related papers
- An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system.
Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches.
This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z) - Invariant Causal Mechanisms through Distribution Matching [86.07327840293894]
In this work we provide a causal perspective and a new algorithm for learning invariant representations.
Empirically we show that this algorithm works well on a diverse set of tasks and in particular we observe state-of-the-art performance on domain generalization.
arXiv Detail & Related papers (2022-06-23T12:06:54Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Bayesian Attention Belief Networks [59.183311769616466]
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks.
This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights.
We show that our method outperforms deterministic attention and state-of-the-art attention in accuracy, uncertainty estimation, generalization across domains, and adversarial attacks.
arXiv Detail & Related papers (2021-06-09T17:46:22Z) - Bayesian Attention Modules [65.52970388117923]
We propose a scalable version of attention that is easy to implement and optimize.
Our experiments show the proposed method brings consistent improvements over the corresponding baselines.
arXiv Detail & Related papers (2020-10-20T20:30:55Z) - A Framework to Learn with Interpretation [2.3741312212138896]
We present a novel framework to jointly learn a predictive model and its associated interpretation model.
We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers.
A detailed pipeline to visualize the learnt features is also developed.
arXiv Detail & Related papers (2020-10-19T09:26:28Z) - Attention Flows: Analyzing and Comparing Attention Mechanisms in
Language Models [5.866941279460248]
We propose a visual analytics approach to understanding fine-tuning in attention-based language models.
Our visualization, Attention Flows, is designed to support users in querying, tracing, and comparing attention within layers, across layers, and amongst attention heads in Transformer-based language models.
arXiv Detail & Related papers (2020-09-03T19:56:30Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.