Related papers: Attention Meets Post-hoc Interpretability: A Mathematical Perspective

Attention Meets Post-hoc Interpretability: A Mathematical Perspective

URL: http://arxiv.org/abs/2402.03485v2
Date: Mon, 17 Jun 2024 13:18:30 GMT
Title: Attention Meets Post-hoc Interpretability: A Mathematical Perspective
Authors: Gianluigi Lopardo, Frederic Precioso, Damien Garreau,
Abstract summary: We mathematically study a simple attention-based architecture and pinpoint the differences between post-hoc and attention-based explanations. We show that they provide quite different results, and that, despite their limitations, post-hoc methods are capable of capturing more useful insights than merely examining the attention weights.
Score: 6.492879435794228
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Attention-based architectures, in particular transformers, are at the heart of a technological revolution. Interestingly, in addition to helping obtain state-of-the-art results on a wide range of applications, the attention mechanism intrinsically provides meaningful insights on the internal behavior of the model. Can these insights be used as explanations? Debate rages on. In this paper, we mathematically study a simple attention-based architecture and pinpoint the differences between post-hoc and attention-based explanations. We show that they provide quite different results, and that, despite their limitations, post-hoc methods are capable of capturing more useful insights than merely examining the attention weights.

Related papers

Understanding Matching Mechanisms in Cross-Encoders [11.192264101562786]
Cross-encoders are highly effective models whose internal mechanisms are mostly unknown.<n>Most works trying to explain their behavior focus on high-level processes.<n>We demonstrate that more straightforward methods can already provide valuable insights.
arXiv Detail & Related papers (2025-07-19T13:05:27Z)
Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based Explanation [0.2499907423888049]
Empirical studies postulate that attention maps can be provided as an explanation for model output. Recent studies show that attention weights in the RNN encoders are hardly plausible because they spread on input tokens. We propose 3 additional constraints to the learning objective function to improve the plausibility of the attention map.
arXiv Detail & Related papers (2025-01-22T10:17:20Z)
Reversed Attention: On The Gradient Descent Of Attention Layers In GPT [55.2480439325792]
We study the mathematics of the backward pass of attention, revealing that it implicitly calculates an attention matrix we refer to as "Reversed Attention" In an experimental setup, we showcase the ability of Reversed Attention to directly alter the forward pass of attention, without modifying the model's weights. In addition to enhancing the comprehension of how LM configure attention layers during backpropagation, Reversed Attention maps contribute to a more interpretable backward pass.
arXiv Detail & Related papers (2024-12-22T13:48:04Z)
From Cognition to Computation: A Comparative Review of Human Attention and Transformer Architectures [1.5266118210763295]
Recent developments in artificial intelligence like the Transformer architecture incorporate the idea of attention in model designs. Our review aims to provide a comparative analysis of these mechanisms from a cognitive-functional perspective.
arXiv Detail & Related papers (2024-04-25T05:13:38Z)
Guiding Visual Question Answering with Attention Priors [76.21671164766073]
We propose to guide the attention mechanism using explicit linguistic-visual grounding. This grounding is derived by connecting structured linguistic concepts in the query to their referents among the visual objects. The resultant algorithm is capable of probing attention-based reasoning models, injecting relevant associative knowledge, and regulating the core reasoning process.
arXiv Detail & Related papers (2022-05-25T09:53:47Z)
Is Sparse Attention more Interpretable? [52.85910570651047]
We investigate how sparsity affects our ability to use attention as an explainability tool. We find that only a weak relationship between inputs and co-indexed intermediate representations exists -- under sparse attention. We observe in this setting that inducing sparsity may make it less plausible that attention can be used as a tool for understanding model behavior.
arXiv Detail & Related papers (2021-06-02T11:42:56Z)
SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity. Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism. We propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design.
arXiv Detail & Related papers (2021-02-25T14:13:44Z)
Interpretability and Explainability: A Machine Learning Zoo Mini-tour [4.56877715768796]
Interpretability and explainability lie at the core of many machine learning and statistical applications in medicine, economics, law, and natural sciences. We emphasise the divide between interpretability and explainability and illustrate these two different research directions with concrete examples of the state-of-the-art.
arXiv Detail & Related papers (2020-12-03T10:11:52Z)
Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference [68.12511526813991]
We provide a novel understanding of multi-head attention from a Bayesian perspective. We propose a non-parametric approach that explicitly improves the repulsiveness in multi-head attention. Experiments on various attention models and applications demonstrate that the proposed repulsive attention can improve the learned feature diversity.
arXiv Detail & Related papers (2020-09-20T06:32:23Z)
Why Attentions May Not Be Interpretable? [46.69116768203185]
Recent research found that attention-as-importance interpretations often do not work as we expected. We show that one root cause of this phenomenon is shortcuts, which means that the attention weights themselves may carry extra information. We propose two methods to mitigate this issue.
arXiv Detail & Related papers (2020-06-10T05:08:30Z)
Explain and Improve: LRP-Inference Fine-Tuning for Image Captioning Models [82.3793660091354]
This paper analyzes the predictions of image captioning models with attention mechanisms beyond visualizing the attention itself. We develop variants of layer-wise relevance propagation (LRP) and gradient-based explanation methods, tailored to image captioning models with attention mechanisms.
arXiv Detail & Related papers (2020-01-04T05:15:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.