Switchable Self-attention Module
- URL: http://arxiv.org/abs/2209.05680v1
- Date: Tue, 13 Sep 2022 01:19:38 GMT
- Title: Switchable Self-attention Module
- Authors: Shanshan Zhong, Wushao Wen, Jinghui Qin
- Abstract summary: We propose a self-attention module SEM.
Based on the input information of the attention module and alternative attention operators, SEM can automatically decide to select and integrate attention operators to compute attention maps.
The effectiveness of SEM is demonstrated by extensive experiments on widely used benchmark datasets and popular self-attention networks.
- Score: 3.8992324495848356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attention mechanism has gained great success in vision recognition. Many
works are devoted to improving the effectiveness of attention mechanism, which
finely design the structure of the attention operator. These works need lots of
experiments to pick out the optimal settings when scenarios change, which
consumes a lot of time and computational resources. In addition, a neural
network often contains many network layers, and most studies often use the same
attention module to enhance different network layers, which hinders the further
improvement of the performance of the self-attention mechanism. To address the
above problems, we propose a self-attention module SEM. Based on the input
information of the attention module and alternative attention operators, SEM
can automatically decide to select and integrate attention operators to compute
attention maps. The effectiveness of SEM is demonstrated by extensive
experiments on widely used benchmark datasets and popular self-attention
networks.
Related papers
- A Primal-Dual Framework for Transformers and Neural Networks [52.814467832108875]
Self-attention is key to the remarkable success of transformers in sequence modeling tasks.
We show that the self-attention corresponds to the support vector expansion derived from a support vector regression problem.
We propose two new attentions: Batch Normalized Attention (Attention-BN) and Attention with Scaled Head (Attention-SH)
arXiv Detail & Related papers (2024-06-19T19:11:22Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - Self-Supervised Implicit Attention: Guided Attention by The Model Itself [1.3406858660972554]
We propose Self-Supervised Implicit Attention (SSIA), a new approach that adaptively guides deep neural network models to gain attention by exploiting the properties of the models themselves.
SSIAA is a novel attention mechanism that does not require any extra parameters, computation, or memory access costs during inference.
Our implementation will be available on GitHub.
arXiv Detail & Related papers (2022-06-15T10:13:34Z) - Assessing the Impact of Attention and Self-Attention Mechanisms on the
Classification of Skin Lesions [0.0]
We focus on two forms of attention mechanisms: attention modules and self-attention.
Attention modules are used to reweight the features of each layer input tensor.
Self-Attention, originally proposed in the area of Natural Language Processing makes it possible to relate all the items in an input sequence.
arXiv Detail & Related papers (2021-12-23T18:02:48Z) - TDAN: Top-Down Attention Networks for Enhanced Feature Selectivity in
CNNs [18.24779045808196]
We propose a lightweight top-down (TD) attention module that iteratively generates a "visual searchlight" to perform top-down channel and spatial modulation of its inputs.
Our models are more robust to changes in input resolution during inference and learn to "shift attention" by localizing individual objects or features at each computation step without any explicit supervision.
arXiv Detail & Related papers (2021-11-26T12:35:17Z) - Alignment Attention by Matching Key and Query Distributions [48.93793773929006]
This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head.
It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention.
On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks.
arXiv Detail & Related papers (2021-10-25T00:54:57Z) - Bayesian Attention Belief Networks [59.183311769616466]
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks.
This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights.
We show that our method outperforms deterministic attention and state-of-the-art attention in accuracy, uncertainty estimation, generalization across domains, and adversarial attacks.
arXiv Detail & Related papers (2021-06-09T17:46:22Z) - Attention in Attention Network for Image Super-Resolution [18.2279472158217]
We quantify and visualize the static attention mechanisms and show that not all attention modules are equally beneficial.
We propose attention in attention network (A$2$N) for highly accurate image SR.
Our model could achieve superior trade-off performances comparing with state-of-the-art lightweight networks.
arXiv Detail & Related papers (2021-04-19T17:59:06Z) - SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity.
Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism.
We propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design.
arXiv Detail & Related papers (2021-02-25T14:13:44Z) - Deep Reinforced Attention Learning for Quality-Aware Visual Recognition [73.15276998621582]
We build upon the weakly-supervised generation mechanism of intermediate attention maps in any convolutional neural networks.
We introduce a meta critic network to evaluate the quality of attention maps in the main network.
arXiv Detail & Related papers (2020-07-13T02:44:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.