Related papers: Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models

Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models

URL: http://arxiv.org/abs/2009.07053v1
Date: Thu, 3 Sep 2020 19:56:30 GMT
Title: Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models
Authors: Joseph F DeRose, Jiayao Wang, and Matthew Berger
Abstract summary: We propose a visual analytics approach to understanding fine-tuning in attention-based language models. Our visualization, Attention Flows, is designed to support users in querying, tracing, and comparing attention within layers, across layers, and amongst attention heads in Transformer-based language models.
Score: 5.866941279460248
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Advances in language modeling have led to the development of deep attention-based models that are performant across a wide variety of natural language processing (NLP) problems. These language models are typified by a pre-training process on large unlabeled text corpora and subsequently fine-tuned for specific tasks. Although considerable work has been devoted to understanding the attention mechanisms of pre-trained models, it is less understood how a model's attention mechanisms change when trained for a target NLP task. In this paper, we propose a visual analytics approach to understanding fine-tuning in attention-based language models. Our visualization, Attention Flows, is designed to support users in querying, tracing, and comparing attention within layers, across layers, and amongst attention heads in Transformer-based language models. To help users gain insight on how a classification decision is made, our design is centered on depicting classification-based attention at the deepest layer and how attention from prior layers flows throughout words in the input. Attention Flows supports the analysis of a single model, as well as the visual comparison between pre-trained and fine-tuned models via their similarities and differences. We use Attention Flows to study attention mechanisms in various sentence understanding tasks and highlight how attention evolves to address the nuances of solving these tasks.

Related papers

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach [33.20992355312175]
Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated remarkable progress in visual understanding. This paper aims to address this question with systematic investigation across 4 model families and 4 model scales. Our analysis reveals a strong correlation between the behavior of these attention heads, the distribution of attention weights, and their concentration on visual tokens within the input.
arXiv Detail & Related papers (2024-12-24T02:31:24Z)
Naturalness of Attention: Revisiting Attention in Code Language Models [3.756550107432323]
Language models for code such as CodeBERT offer the capability to learn advanced source code representation, but their opacity poses barriers to understanding of captured properties. This study aims to shed some light on the previously ignored factors of the attention mechanism beyond the attention weights.
arXiv Detail & Related papers (2023-11-22T16:34:12Z)
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models. A meta-model can learn on self-supervised prompts consisting of tailored demonstrations. Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z)
On the Interpretability of Attention Networks [1.299941371793082]
We show how an attention model can be accurate but fail to be interpretable, and show that such models do occur as a result of training. We evaluate a few attention model learning algorithms designed to encourage sparsity and demonstrate that these algorithms help improve interpretability.
arXiv Detail & Related papers (2022-12-30T15:31:22Z)
Guiding Visual Question Answering with Attention Priors [76.21671164766073]
We propose to guide the attention mechanism using explicit linguistic-visual grounding. This grounding is derived by connecting structured linguistic concepts in the query to their referents among the visual objects. The resultant algorithm is capable of probing attention-based reasoning models, injecting relevant associative knowledge, and regulating the core reasoning process.
arXiv Detail & Related papers (2022-05-25T09:53:47Z)
Alignment Attention by Matching Key and Query Distributions [48.93793773929006]
This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head. It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention. On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks.
arXiv Detail & Related papers (2021-10-25T00:54:57Z)
Bayesian Attention Belief Networks [59.183311769616466]
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks. This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights. We show that our method outperforms deterministic attention and state-of-the-art attention in accuracy, uncertainty estimation, generalization across domains, and adversarial attacks.
arXiv Detail & Related papers (2021-06-09T17:46:22Z)
Improve the Interpretability of Attention: A Fast, Accurate, and Interpretable High-Resolution Attention Model [6.906621279967867]
We propose a novel Bilinear Representative Non-Parametric Attention (BR-NPA) strategy that captures the task-relevant human-interpretable information. The proposed model can be easily adapted in a wide variety of modern deep models, where classification is involved. It is also more accurate, faster, and with a smaller memory footprint than usual neural attention modules.
arXiv Detail & Related papers (2021-06-04T15:57:37Z)
Dodrio: Exploring Transformer Models with Interactive Visualization [10.603327364971559]
Dodrio is an open-source interactive visualization tool to help NLP researchers and practitioners analyze attention mechanisms in transformer-based models with linguistic knowledge. To facilitate the visual comparison of attention weights and linguistic knowledge, Dodrio applies different graph visualization techniques to represent attention weights with longer input text.
arXiv Detail & Related papers (2021-03-26T17:39:37Z)
SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity. Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism. We propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design.
arXiv Detail & Related papers (2021-02-25T14:13:44Z)
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models [65.19308052012858]
Recent Transformer-based large-scale pre-trained models have revolutionized vision-and-language (V+L) research. We present VALUE, a set of meticulously designed probing tasks to decipher the inner workings of multimodal pre-training. Key observations: Pre-trained models exhibit a propensity for attending over text rather than images during inference.
arXiv Detail & Related papers (2020-05-15T01:06:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.