A General Survey on Attention Mechanisms in Deep Learning
- URL: http://arxiv.org/abs/2203.14263v1
- Date: Sun, 27 Mar 2022 10:06:23 GMT
- Title: A General Survey on Attention Mechanisms in Deep Learning
- Authors: Gianni Brauwers and Flavius Frasincar
- Abstract summary: This survey provides an overview of the most important attention mechanisms proposed in the literature.
The various attention mechanisms are explained by means of a framework consisting of a general attention model, uniform notation, and a comprehensive taxonomy of attention mechanisms.
- Score: 7.5537115673774275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attention is an important mechanism that can be employed for a variety of
deep learning models across many different domains and tasks. This survey
provides an overview of the most important attention mechanisms proposed in the
literature. The various attention mechanisms are explained by means of a
framework consisting of a general attention model, uniform notation, and a
comprehensive taxonomy of attention mechanisms. Furthermore, the various
measures for evaluating attention models are reviewed, and methods to
characterize the structure of attention models based on the proposed framework
are discussed. Last, future work in the field of attention models is
considered.
Related papers
- On the Anatomy of Attention [0.0]
We introduce a category-theoretic diagrammatic formalism in order to systematically relate and reason about machine learning models.
Our diagrams present architectures intuitively but without loss of essential detail, where natural relationships between models are captured by graphical transformations.
arXiv Detail & Related papers (2024-07-02T16:50:26Z) - Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights [5.798431829723857]
This paper provides a comprehensive exploration of techniques and insights for designing attention mechanisms in Vision Transformer (ViT) networks.
We present a systematic taxonomy of various attention mechanisms within ViTs, employing redesigned approaches.
The analysis includes an exploration of the novelty, strengths, weaknesses, and an in-depth evaluation of the different proposed strategies.
arXiv Detail & Related papers (2024-03-28T23:31:59Z) - Enhancing Generative Class Incremental Learning Performance with Model Forgetting Approach [50.36650300087987]
This study presents a novel approach to Generative Class Incremental Learning (GCIL) by introducing the forgetting mechanism.
We have found that integrating the forgetting mechanisms significantly enhances the models' performance in acquiring new knowledge.
arXiv Detail & Related papers (2024-03-27T05:10:38Z) - AttentionViz: A Global View of Transformer Attention [60.82904477362676]
We present a new visualization technique designed to help researchers understand the self-attention mechanism in transformers.
The main idea behind our method is to visualize a joint embedding of the query and key vectors used by transformer models to compute attention.
We create an interactive visualization tool, AttentionViz, based on these joint query-key embeddings.
arXiv Detail & Related papers (2023-05-04T23:46:49Z) - Attention Mechanisms in Computer Vision: A Survey [75.6074182122423]
We provide a comprehensive review of various attention mechanisms in computer vision.
We categorize them according to approach, such as channel attention, spatial attention, temporal attention and branch attention.
We suggest future directions for attention mechanism research.
arXiv Detail & Related papers (2021-11-15T09:18:40Z) - Alignment Attention by Matching Key and Query Distributions [48.93793773929006]
This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head.
It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention.
On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks.
arXiv Detail & Related papers (2021-10-25T00:54:57Z) - How Far Does BERT Look At:Distance-based Clustering and Analysis of
BERT$'$s Attention [20.191319097826266]
We cluster attention heatmaps into significantly different patterns through unsupervised clustering.
Our proposed features can be used to explain and calibrate different attention heads in Transformer models.
arXiv Detail & Related papers (2020-11-02T12:52:31Z) - Forethought and Hindsight in Credit Assignment [62.05690959741223]
We work to understand the gains and peculiarities of planning employed as forethought via forward models or as hindsight operating with backward models.
We investigate the best use of models in planning, primarily focusing on the selection of states in which predictions should be (re)-evaluated.
arXiv Detail & Related papers (2020-10-26T16:00:47Z) - Repulsive Attention: Rethinking Multi-head Attention as Bayesian
Inference [68.12511526813991]
We provide a novel understanding of multi-head attention from a Bayesian perspective.
We propose a non-parametric approach that explicitly improves the repulsiveness in multi-head attention.
Experiments on various attention models and applications demonstrate that the proposed repulsive attention can improve the learned feature diversity.
arXiv Detail & Related papers (2020-09-20T06:32:23Z) - Attention Flows: Analyzing and Comparing Attention Mechanisms in
Language Models [5.866941279460248]
We propose a visual analytics approach to understanding fine-tuning in attention-based language models.
Our visualization, Attention Flows, is designed to support users in querying, tracing, and comparing attention within layers, across layers, and amongst attention heads in Transformer-based language models.
arXiv Detail & Related papers (2020-09-03T19:56:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.