A General Survey on Attention Mechanisms in Deep Learning
        - URL: http://arxiv.org/abs/2203.14263v1
- Date: Sun, 27 Mar 2022 10:06:23 GMT
- Title: A General Survey on Attention Mechanisms in Deep Learning
- Authors: Gianni Brauwers and Flavius Frasincar
- Abstract summary: This survey provides an overview of the most important attention mechanisms proposed in the literature.
The various attention mechanisms are explained by means of a framework consisting of a general attention model, uniform notation, and a comprehensive taxonomy of attention mechanisms.
- Score: 7.5537115673774275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Attention is an important mechanism that can be employed for a variety of
deep learning models across many different domains and tasks. This survey
provides an overview of the most important attention mechanisms proposed in the
literature. The various attention mechanisms are explained by means of a
framework consisting of a general attention model, uniform notation, and a
comprehensive taxonomy of attention mechanisms. Furthermore, the various
measures for evaluating attention models are reviewed, and methods to
characterize the structure of attention models based on the proposed framework
are discussed. Last, future work in the field of attention models is
considered.
 
      
        Related papers
        - Attention in Diffusion Model: A Survey [17.11612595063082]
 This paper presents a comprehensive survey of attention within diffusion models.
We systematically analyse its roles, design patterns, and operations across different modalities and tasks.
We propose a unified taxonomy that categorises attention-related modifications into parts according to the structural components they affect.
 arXiv  Detail & Related papers  (2025-04-01T09:00:49Z)
- On the Anatomy of Attention [0.0]
 We introduce a category-theoretic diagrammatic formalism in order to systematically relate and reason about machine learning models.
Our diagrams present architectures intuitively but without loss of essential detail, where natural relationships between models are captured by graphical transformations.
 arXiv  Detail & Related papers  (2024-07-02T16:50:26Z)
- On Understanding Attention-Based In-Context Learning for Categorical   Data [49.40350941996942]
 We develop a network composed of attention blocks, with each block employing a self-attention layer followed by a cross-attention layer, with associated skip connections.<n>This model can exactly perform multi-step functional GD inference for in-context inference with categorical observations.
 arXiv  Detail & Related papers  (2024-05-27T15:03:21Z)
- Enhancing Efficiency in Vision Transformer Networks: Design Techniques   and Insights [5.798431829723857]
 This paper provides a comprehensive exploration of techniques and insights for designing attention mechanisms in Vision Transformer (ViT) networks.
We present a systematic taxonomy of various attention mechanisms within ViTs, employing redesigned approaches.
The analysis includes an exploration of the novelty, strengths, weaknesses, and an in-depth evaluation of the different proposed strategies.
 arXiv  Detail & Related papers  (2024-03-28T23:31:59Z)
- Enhancing Generative Class Incremental Learning Performance with Model   Forgetting Approach [50.36650300087987]
 This study presents a novel approach to Generative Class Incremental Learning (GCIL) by introducing the forgetting mechanism.
We have found that integrating the forgetting mechanisms significantly enhances the models' performance in acquiring new knowledge.
 arXiv  Detail & Related papers  (2024-03-27T05:10:38Z)
- AttentionViz: A Global View of Transformer Attention [60.82904477362676]
 We present a new visualization technique designed to help researchers understand the self-attention mechanism in transformers.
The main idea behind our method is to visualize a joint embedding of the query and key vectors used by transformer models to compute attention.
We create an interactive visualization tool, AttentionViz, based on these joint query-key embeddings.
 arXiv  Detail & Related papers  (2023-05-04T23:46:49Z)
- Attention Mechanisms in Computer Vision: A Survey [75.6074182122423]
 We provide a comprehensive review of various attention mechanisms in computer vision.
We categorize them according to approach, such as channel attention, spatial attention, temporal attention and branch attention.
We suggest future directions for attention mechanism research.
 arXiv  Detail & Related papers  (2021-11-15T09:18:40Z)
- Alignment Attention by Matching Key and Query Distributions [48.93793773929006]
 This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head.
It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention.
On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks.
 arXiv  Detail & Related papers  (2021-10-25T00:54:57Z)
- How Far Does BERT Look At:Distance-based Clustering and Analysis of
  BERT$'$s Attention [20.191319097826266]
 We cluster attention heatmaps into significantly different patterns through unsupervised clustering.
Our proposed features can be used to explain and calibrate different attention heads in Transformer models.
 arXiv  Detail & Related papers  (2020-11-02T12:52:31Z)
- Forethought and Hindsight in Credit Assignment [62.05690959741223]
 We work to understand the gains and peculiarities of planning employed as forethought via forward models or as hindsight operating with backward models.
We investigate the best use of models in planning, primarily focusing on the selection of states in which predictions should be (re)-evaluated.
 arXiv  Detail & Related papers  (2020-10-26T16:00:47Z)
- Repulsive Attention: Rethinking Multi-head Attention as Bayesian
  Inference [68.12511526813991]
 We provide a novel understanding of multi-head attention from a Bayesian perspective.
We propose a non-parametric approach that explicitly improves the repulsiveness in multi-head attention.
Experiments on various attention models and applications demonstrate that the proposed repulsive attention can improve the learned feature diversity.
 arXiv  Detail & Related papers  (2020-09-20T06:32:23Z)
- Attention Flows: Analyzing and Comparing Attention Mechanisms in
  Language Models [5.866941279460248]
 We propose a visual analytics approach to understanding fine-tuning in attention-based language models.
Our visualization, Attention Flows, is designed to support users in querying, tracing, and comparing attention within layers, across layers, and amongst attention heads in Transformer-based language models.
 arXiv  Detail & Related papers  (2020-09-03T19:56:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.