Attention as Activation
- URL: http://arxiv.org/abs/2007.07729v2
- Date: Sun, 2 Aug 2020 09:40:56 GMT
- Title: Attention as Activation
- Authors: Yimian Dai and Stefan Oehmcke and Fabian Gieseke and Yiquan Wu and
Kobus Barnard
- Abstract summary: We propose a novel type of activation units called attentional activation (ATAC) units as a unification of activation functions and attention mechanisms.
By replacing the well-known rectified linear units by such ATAC units in convolutional networks, we can construct fully attentional networks that perform significantly better.
- Score: 4.265244011052538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Activation functions and attention mechanisms are typically treated as having
different purposes and have evolved differently. However, both concepts can be
formulated as a non-linear gating function. Inspired by their similarity, we
propose a novel type of activation units called attentional activation (ATAC)
units as a unification of activation functions and attention mechanisms. In
particular, we propose a local channel attention module for the simultaneous
non-linear activation and element-wise feature refinement, which locally
aggregates point-wise cross-channel feature contexts. By replacing the
well-known rectified linear units by such ATAC units in convolutional networks,
we can construct fully attentional networks that perform significantly better
with a modest number of additional parameters. We conducted detailed ablation
studies on the ATAC units using several host networks with varying network
depths to empirically verify the effectiveness and efficiency of the units.
Furthermore, we compared the performance of the ATAC units against existing
activation functions as well as other attention mechanisms on the CIFAR-10,
CIFAR-100, and ImageNet datasets. Our experimental results show that networks
constructed with the proposed ATAC units generally yield performance gains over
their competitors given a comparable number of parameters.
Related papers
- LoFLAT: Local Feature Matching using Focused Linear Attention Transformer [36.53651224633837]
We propose the LoFLAT, a novel Local Feature matching using Focused Linear Attention Transformer.
Our LoFLAT consists of three main modules: the Feature Extraction Module, the Feature Transformer Module, and the Matching Module.
The proposed LoFLAT outperforms the LoFTR method in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2024-10-30T05:38:07Z) - Towards Robust Semantic Segmentation against Patch-based Attack via Attention Refinement [68.31147013783387]
We observe that the attention mechanism is vulnerable to patch-based adversarial attacks.
In this paper, we propose a Robust Attention Mechanism (RAM) to improve the robustness of the semantic segmentation model.
arXiv Detail & Related papers (2024-01-03T13:58:35Z) - Associative Transformer [26.967506484952214]
We propose Associative Transformer (AiT) to enhance the association among sparsely attended input patches.
AiT requires significantly fewer parameters and attention layers while outperforming Vision Transformers and a broad range of sparse Transformers.
arXiv Detail & Related papers (2023-09-22T13:37:10Z) - Sparse Modular Activation for Efficient Sequence Modeling [94.11125833685583]
Recent models combining Linear State Space Models with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks.
Current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs.
We introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely activate sub-modules for sequence elements in a differentiable manner.
arXiv Detail & Related papers (2023-06-19T23:10:02Z) - Systematic Architectural Design of Scale Transformed Attention Condenser
DNNs via Multi-Scale Class Representational Response Similarity Analysis [93.0013343535411]
We propose a novel type of analysis called Multi-Scale Class Representational Response Similarity Analysis (ClassRepSim)
We show that adding STAC modules to ResNet style architectures can result in up to a 1.6% increase in top-1 accuracy.
Results from ClassRepSim analysis can be used to select an effective parameterization of the STAC module resulting in competitive performance.
arXiv Detail & Related papers (2023-06-16T18:29:26Z) - ASR: Attention-alike Structural Re-parameterization [53.019657810468026]
We propose a simple-yet-effective attention-alike structural re- parameterization (ASR) that allows us to achieve SRP for a given network while enjoying the effectiveness of the attention mechanism.
In this paper, we conduct extensive experiments from a statistical perspective and discover an interesting phenomenon Stripe Observation, which reveals that channel attention values quickly approach some constant vectors during training.
arXiv Detail & Related papers (2023-04-13T08:52:34Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - Self-Supervised Implicit Attention: Guided Attention by The Model Itself [1.3406858660972554]
We propose Self-Supervised Implicit Attention (SSIA), a new approach that adaptively guides deep neural network models to gain attention by exploiting the properties of the models themselves.
SSIAA is a novel attention mechanism that does not require any extra parameters, computation, or memory access costs during inference.
Our implementation will be available on GitHub.
arXiv Detail & Related papers (2022-06-15T10:13:34Z) - TDAN: Top-Down Attention Networks for Enhanced Feature Selectivity in
CNNs [18.24779045808196]
We propose a lightweight top-down (TD) attention module that iteratively generates a "visual searchlight" to perform top-down channel and spatial modulation of its inputs.
Our models are more robust to changes in input resolution during inference and learn to "shift attention" by localizing individual objects or features at each computation step without any explicit supervision.
arXiv Detail & Related papers (2021-11-26T12:35:17Z) - Class Semantics-based Attention for Action Detection [10.69685258736244]
Action localization networks are often structured as a feature encoder sub-network and a localization sub-network.
We propose a novel attention mechanism, the Class Semantics-based Attention (CSA), that learns from the temporal distribution of semantics of action classes present in an input video.
Our attention mechanism outperforms prior self-attention modules such as the squeeze-and-excitation in action detection task.
arXiv Detail & Related papers (2021-09-06T17:22:46Z) - DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator
Search [55.164053971213576]
convolutional neural network has achieved great success in fulfilling computer vision tasks despite large computation overhead.
Structured (channel) pruning is usually applied to reduce the model redundancy while preserving the network structure.
Existing structured pruning methods require hand-crafted rules which may lead to tremendous pruning space.
arXiv Detail & Related papers (2020-11-04T07:43:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.