Related papers: Towards Understanding the Effectiveness of Attention Mechanism

Towards Understanding the Effectiveness of Attention Mechanism

URL: http://arxiv.org/abs/2106.15067v1
Date: Tue, 29 Jun 2021 02:58:59 GMT
Title: Towards Understanding the Effectiveness of Attention Mechanism
Authors: Xiang Ye and Zihang He and Heng Wang and Yong Li
Abstract summary: We find that there is only a weak consistency between the attention weights of features and their importance. With the high order non-linearity brought by the feature map multiplication, it played a regularization role on CNNs. We design feature map multiplication network (FMMNet) by simply replacing the feature map addition in ResNet with feature map multiplication.
Score: 7.809333418199897
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Attention Mechanism is a widely used method for improving the performance of convolutional neural networks (CNNs) on computer vision tasks. Despite its pervasiveness, we have a poor understanding of what its effectiveness stems from. It is popularly believed that its effectiveness stems from the visual attention explanation, advocating focusing on the important part of input data rather than ingesting the entire input. In this paper, we find that there is only a weak consistency between the attention weights of features and their importance. Instead, we verify the crucial role of feature map multiplication in attention mechanism and uncover a fundamental impact of feature map multiplication on the learned landscapes of CNNs: with the high order non-linearity brought by the feature map multiplication, it played a regularization role on CNNs, which made them learn smoother and more stable landscapes near real samples compared to vanilla CNNs. This smoothness and stability induce a more predictive and stable behavior in-between real samples, and make CNNs generate better. Moreover, motivated by the proposed effectiveness of feature map multiplication, we design feature map multiplication network (FMMNet) by simply replacing the feature map addition in ResNet with feature map multiplication. FMMNet outperforms ResNet on various datasets, and this indicates that feature map multiplication plays a vital role in improving the performance even without finely designed attention mechanism in existing methods.

Related papers

Threshold Attention Network for Semantic Segmentation of Remote Sensing Images [3.5449012582104795]
Self-attention mechanism (SA) is an effective approach for designing segmentation networks. We propose a novel threshold attention mechanism (TAM) for semantic segmentation. Based on TAM, we present a threshold attention network (TANet) for semantic segmentation.
arXiv Detail & Related papers (2025-01-14T10:09:55Z)
Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks. We show that the networks acquire strong, data-dependent features. Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z)
Understanding Self-attention Mechanism via Dynamical System Perspective [58.024376086269015]
Self-attention mechanism (SAM) is widely used in various fields of artificial intelligence. We show that intrinsic stiffness phenomenon (SP) in the high-precision solution of ordinary differential equations (ODEs) also widely exists in high-performance neural networks (NN) We show that the SAM is also a stiffness-aware step size adaptor that can enhance the model's representational ability to measure intrinsic SP.
arXiv Detail & Related papers (2023-08-19T08:17:41Z)
Influencer Detection with Dynamic Graph Neural Networks [56.1837101824783]
We investigate different dynamic Graph Neural Networks (GNNs) configurations for influencer detection. We show that using deep multi-head attention in GNN and encoding temporal attributes significantly improves performance.
arXiv Detail & Related papers (2022-11-15T13:00:25Z)
A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers. Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module. Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z)
Invariant Causal Mechanisms through Distribution Matching [86.07327840293894]
In this work we provide a causal perspective and a new algorithm for learning invariant representations. Empirically we show that this algorithm works well on a diverse set of tasks and in particular we observe state-of-the-art performance on domain generalization.
arXiv Detail & Related papers (2022-06-23T12:06:54Z)
Self-Supervised Implicit Attention: Guided Attention by The Model Itself [1.3406858660972554]
We propose Self-Supervised Implicit Attention (SSIA), a new approach that adaptively guides deep neural network models to gain attention by exploiting the properties of the models themselves. SSIAA is a novel attention mechanism that does not require any extra parameters, computation, or memory access costs during inference. Our implementation will be available on GitHub.
arXiv Detail & Related papers (2022-06-15T10:13:34Z)
TDAN: Top-Down Attention Networks for Enhanced Feature Selectivity in CNNs [18.24779045808196]
We propose a lightweight top-down (TD) attention module that iteratively generates a "visual searchlight" to perform top-down channel and spatial modulation of its inputs. Our models are more robust to changes in input resolution during inference and learn to "shift attention" by localizing individual objects or features at each computation step without any explicit supervision.
arXiv Detail & Related papers (2021-11-26T12:35:17Z)
Bayesian Attention Belief Networks [59.183311769616466]
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks. This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights. We show that our method outperforms deterministic attention and state-of-the-art attention in accuracy, uncertainty estimation, generalization across domains, and adversarial attacks.
arXiv Detail & Related papers (2021-06-09T17:46:22Z)
LFI-CAM: Learning Feature Importance for Better Visual Explanation [31.743421292094308]
Class Activation Mapping (CAM) is a powerful technique used to understand the decision making of Convolutional Neural Network (CNN) in computer vision. We propose a novel architecture, LFI-CAM, which is trainable for image classification and visual explanation in an end-to-end manner.
arXiv Detail & Related papers (2021-05-03T15:12:21Z)
Multi-stage Attention ResU-Net for Semantic Segmentation of Fine-Resolution Remote Sensing Images [9.398340832493457]
We propose a Linear Attention Mechanism (LAM) to address this issue. LAM is approximately equivalent to dot-product attention with computational efficiency. We design a Multi-stage Attention ResU-Net for semantic segmentation from fine-resolution remote sensing images.
arXiv Detail & Related papers (2020-11-29T07:24:21Z)
Non-Linearities Improve OrigiNet based on Active Imaging for Micro Expression Recognition [8.112868317921853]
We introduce an active imaging concept to segregate active changes in expressive regions of a video into a single frame. We propose a shallow CNN network: hybrid local receptive field based augmented learning network (OrigiNet) that efficiently learns significant features of the micro-expressions in a video.
arXiv Detail & Related papers (2020-05-16T13:44:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.