Towards Understanding the Effectiveness of Attention Mechanism
- URL: http://arxiv.org/abs/2106.15067v1
- Date: Tue, 29 Jun 2021 02:58:59 GMT
- Title: Towards Understanding the Effectiveness of Attention Mechanism
- Authors: Xiang Ye and Zihang He and Heng Wang and Yong Li
- Abstract summary: We find that there is only a weak consistency between the attention weights of features and their importance.
With the high order non-linearity brought by the feature map multiplication, it played a regularization role on CNNs.
We design feature map multiplication network (FMMNet) by simply replacing the feature map addition in ResNet with feature map multiplication.
- Score: 7.809333418199897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attention Mechanism is a widely used method for improving the performance of
convolutional neural networks (CNNs) on computer vision tasks. Despite its
pervasiveness, we have a poor understanding of what its effectiveness stems
from. It is popularly believed that its effectiveness stems from the visual
attention explanation, advocating focusing on the important part of input data
rather than ingesting the entire input. In this paper, we find that there is
only a weak consistency between the attention weights of features and their
importance. Instead, we verify the crucial role of feature map multiplication
in attention mechanism and uncover a fundamental impact of feature map
multiplication on the learned landscapes of CNNs: with the high order
non-linearity brought by the feature map multiplication, it played a
regularization role on CNNs, which made them learn smoother and more stable
landscapes near real samples compared to vanilla CNNs. This smoothness and
stability induce a more predictive and stable behavior in-between real samples,
and make CNNs generate better. Moreover, motivated by the proposed
effectiveness of feature map multiplication, we design feature map
multiplication network (FMMNet) by simply replacing the feature map addition in
ResNet with feature map multiplication. FMMNet outperforms ResNet on various
datasets, and this indicates that feature map multiplication plays a vital role
in improving the performance even without finely designed attention mechanism
in existing methods.
Related papers
- Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Understanding Self-attention Mechanism via Dynamical System Perspective [58.024376086269015]
Self-attention mechanism (SAM) is widely used in various fields of artificial intelligence.
We show that intrinsic stiffness phenomenon (SP) in the high-precision solution of ordinary differential equations (ODEs) also widely exists in high-performance neural networks (NN)
We show that the SAM is also a stiffness-aware step size adaptor that can enhance the model's representational ability to measure intrinsic SP.
arXiv Detail & Related papers (2023-08-19T08:17:41Z) - Influencer Detection with Dynamic Graph Neural Networks [56.1837101824783]
We investigate different dynamic Graph Neural Networks (GNNs) configurations for influencer detection.
We show that using deep multi-head attention in GNN and encoding temporal attributes significantly improves performance.
arXiv Detail & Related papers (2022-11-15T13:00:25Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - Invariant Causal Mechanisms through Distribution Matching [86.07327840293894]
In this work we provide a causal perspective and a new algorithm for learning invariant representations.
Empirically we show that this algorithm works well on a diverse set of tasks and in particular we observe state-of-the-art performance on domain generalization.
arXiv Detail & Related papers (2022-06-23T12:06:54Z) - Self-Supervised Implicit Attention: Guided Attention by The Model Itself [1.3406858660972554]
We propose Self-Supervised Implicit Attention (SSIA), a new approach that adaptively guides deep neural network models to gain attention by exploiting the properties of the models themselves.
SSIAA is a novel attention mechanism that does not require any extra parameters, computation, or memory access costs during inference.
Our implementation will be available on GitHub.
arXiv Detail & Related papers (2022-06-15T10:13:34Z) - TDAN: Top-Down Attention Networks for Enhanced Feature Selectivity in
CNNs [18.24779045808196]
We propose a lightweight top-down (TD) attention module that iteratively generates a "visual searchlight" to perform top-down channel and spatial modulation of its inputs.
Our models are more robust to changes in input resolution during inference and learn to "shift attention" by localizing individual objects or features at each computation step without any explicit supervision.
arXiv Detail & Related papers (2021-11-26T12:35:17Z) - Bayesian Attention Belief Networks [59.183311769616466]
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks.
This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights.
We show that our method outperforms deterministic attention and state-of-the-art attention in accuracy, uncertainty estimation, generalization across domains, and adversarial attacks.
arXiv Detail & Related papers (2021-06-09T17:46:22Z) - LFI-CAM: Learning Feature Importance for Better Visual Explanation [31.743421292094308]
Class Activation Mapping (CAM) is a powerful technique used to understand the decision making of Convolutional Neural Network (CNN) in computer vision.
We propose a novel architecture, LFI-CAM, which is trainable for image classification and visual explanation in an end-to-end manner.
arXiv Detail & Related papers (2021-05-03T15:12:21Z) - Multi-stage Attention ResU-Net for Semantic Segmentation of
Fine-Resolution Remote Sensing Images [9.398340832493457]
We propose a Linear Attention Mechanism (LAM) to address this issue.
LAM is approximately equivalent to dot-product attention with computational efficiency.
We design a Multi-stage Attention ResU-Net for semantic segmentation from fine-resolution remote sensing images.
arXiv Detail & Related papers (2020-11-29T07:24:21Z) - Non-Linearities Improve OrigiNet based on Active Imaging for Micro
Expression Recognition [8.112868317921853]
We introduce an active imaging concept to segregate active changes in expressive regions of a video into a single frame.
We propose a shallow CNN network: hybrid local receptive field based augmented learning network (OrigiNet) that efficiently learns significant features of the micro-expressions in a video.
arXiv Detail & Related papers (2020-05-16T13:44:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.