Beyond Self-attention: External Attention using Two Linear Layers for
Visual Tasks
- URL: http://arxiv.org/abs/2105.02358v1
- Date: Wed, 5 May 2021 22:29:52 GMT
- Title: Beyond Self-attention: External Attention using Two Linear Layers for
Visual Tasks
- Authors: Meng-Hao Guo, Zheng-Ning Liu, Tai-Jiang Mu, Shi-Min Hu
- Abstract summary: We propose a novel attention mechanism which we call external attention, based on two external, small, learnable, and shared memories.
Our method provides comparable or superior performance to the self-attention mechanism and some of its variants, with much lower computational and memory costs.
- Score: 34.32609892928909
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attention mechanisms, especially self-attention, play an increasingly
important role in deep feature representation in visual tasks. Self-attention
updates the feature at each position by computing a weighted sum of features
using pair-wise affinities across all positions to capture long-range
dependency within a single sample. However, self-attention has a quadratic
complexity and ignores potential correlation between different samples. This
paper proposes a novel attention mechanism which we call external attention,
based on two external, small, learnable, and shared memories, which can be
implemented easily by simply using two cascaded linear layers and two
normalization layers; it conveniently replaces self-attention in existing
popular architectures. External attention has linear complexity and implicitly
considers the correlations between all samples. Extensive experiments on image
classification, semantic segmentation, image generation, point cloud
classification and point cloud segmentation tasks reveal that our method
provides comparable or superior performance to the self-attention mechanism and
some of its variants, with much lower computational and memory costs.
Related papers
- Interactive Multi-Head Self-Attention with Linear Complexity [60.112941134420204]
We show that the interactions between cross-heads of the attention matrix enhance the information flow of the attention operation.
We propose an effective method to decompose the attention operation into query- and key-less components.
arXiv Detail & Related papers (2024-02-27T13:47:23Z) - Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos [63.94040814459116]
Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence.
We propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps.
We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations.
arXiv Detail & Related papers (2023-08-19T09:12:13Z) - Coneheads: Hierarchy Aware Attention [40.685504511826885]
We introduce cone attention, a drop-in replacement for dot product attention.
Cone attention associates two points by the depth of their lowest common ancestor in a hierarchy defined by hyperbolic cones.
We show that it improves task-level performance over dot product attention and other baselines.
arXiv Detail & Related papers (2023-06-01T06:53:14Z) - Dual Cross-Attention Learning for Fine-Grained Visual Categorization and
Object Re-Identification [19.957957963417414]
We propose a dual cross-attention learning (DCAL) algorithm to coordinate with self-attention learning.
First, we propose global-local cross-attention (GLCA) to enhance the interactions between global images and local high-response regions.
Second, we propose pair-wise cross-attention (PWCA) to establish the interactions between image pairs.
arXiv Detail & Related papers (2022-05-04T16:14:26Z) - Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme.
Our network deeply embeds cross-image feature correlation in multiple layers of the feature network.
Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z) - Self-Attention Neural Bag-of-Features [103.70855797025689]
We build on the recently introduced 2D-Attention and reformulate the attention learning methodology.
We propose a joint feature-temporal attention mechanism that learns a joint 2D attention mask highlighting relevant information.
arXiv Detail & Related papers (2022-01-26T17:54:14Z) - Compositional Attention: Disentangling Search and Retrieval [66.7108739597771]
Multi-head, key-value attention is the backbone of the Transformer model and its variants.
Standard attention heads learn a rigid mapping between search and retrieval.
We propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure.
arXiv Detail & Related papers (2021-10-18T15:47:38Z) - Instance-aware Remote Sensing Image Captioning with Cross-hierarchy
Attention [11.23821696220285]
spatial attention is a straightforward approach to enhance the performance for remote sensing image captioning.
We propose a remote sensing image caption generator with instance-awareness and cross-hierarchy attention.
arXiv Detail & Related papers (2021-05-11T12:59:07Z) - Capturing Multi-Resolution Context by Dilated Self-Attention [58.69803243323346]
We propose a combination of restricted self-attention and a dilation mechanism, which we refer to as dilated self-attention.
The restricted self-attention allows attention to neighboring frames of the query at a high resolution, and the dilation mechanism summarizes distant information to allow attending to it with a lower resolution.
ASR results demonstrate substantial improvements compared to restricted self-attention alone, achieving similar results compared to full-sequence based self-attention with a fraction of the computational costs.
arXiv Detail & Related papers (2021-04-07T02:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.