Coordinate Attention for Efficient Mobile Network Design
- URL: http://arxiv.org/abs/2103.02907v1
- Date: Thu, 4 Mar 2021 09:18:02 GMT
- Title: Coordinate Attention for Efficient Mobile Network Design
- Authors: Qibin Hou, Daquan Zhou, Jiashi Feng
- Abstract summary: We propose a novel attention mechanism for mobile networks by embedding positional information into channel attention.
Unlike channel attention that transforms a feature tensor to a single feature vector via 2D global pooling, the coordinate attention factorizes channel attention into two 1D feature encoding processes.
Our coordinate attention is beneficial to ImageNet classification and behaves better in down-stream tasks, such as object detection and semantic segmentation.
- Score: 96.40415345942186
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies on mobile network design have demonstrated the remarkable
effectiveness of channel attention (e.g., the Squeeze-and-Excitation attention)
for lifting model performance, but they generally neglect the positional
information, which is important for generating spatially selective attention
maps. In this paper, we propose a novel attention mechanism for mobile networks
by embedding positional information into channel attention, which we call
"coordinate attention". Unlike channel attention that transforms a feature
tensor to a single feature vector via 2D global pooling, the coordinate
attention factorizes channel attention into two 1D feature encoding processes
that aggregate features along the two spatial directions, respectively. In this
way, long-range dependencies can be captured along one spatial direction and
meanwhile precise positional information can be preserved along the other
spatial direction. The resulting feature maps are then encoded separately into
a pair of direction-aware and position-sensitive attention maps that can be
complementarily applied to the input feature map to augment the representations
of the objects of interest. Our coordinate attention is simple and can be
flexibly plugged into classic mobile networks, such as MobileNetV2, MobileNeXt,
and EfficientNet with nearly no computational overhead. Extensive experiments
demonstrate that our coordinate attention is not only beneficial to ImageNet
classification but more interestingly, behaves better in down-stream tasks,
such as object detection and semantic segmentation. Code is available at
https://github.com/Andrew-Qibin/CoordAttention.
Related papers
- ELA: Efficient Local Attention for Deep Convolutional Neural Networks [15.976475674061287]
This paper introduces an Efficient Local Attention (ELA) method that achieves substantial performance improvements with a simple structure.
To overcome these challenges, we propose the incorporation of 1D convolution and Group Normalization feature enhancement techniques.
ELA can be seamlessly integrated into deep CNN networks such as ResNet, MobileNet, and DeepLab.
arXiv Detail & Related papers (2024-03-02T08:06:18Z) - Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme.
Our network deeply embeds cross-image feature correlation in multiple layers of the feature network.
Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z) - One Point is All You Need: Directional Attention Point for Feature
Learning [51.44837108615402]
We present a novel attention-based mechanism for learning enhanced point features for tasks such as point cloud classification and segmentation.
We show that our attention mechanism can be easily incorporated into state-of-the-art point cloud classification and segmentation networks.
arXiv Detail & Related papers (2020-12-11T11:45:39Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - Rotate to Attend: Convolutional Triplet Attention Module [21.228370317693244]
We present triplet attention, a novel method for computing attention weights using a three-branch structure.
Our method is simple as well as efficient and can be easily plugged into classic backbone networks as an add-on module.
We demonstrate the effectiveness of our method on various challenging tasks including image classification on ImageNet-1k and object detection on MSCOCO and PASCAL VOC datasets.
arXiv Detail & Related papers (2020-10-06T21:31:00Z) - Multi-Attention-Network for Semantic Segmentation of Fine Resolution
Remote Sensing Images [10.835342317692884]
The accuracy of semantic segmentation in remote sensing images has been increased significantly by deep convolutional neural networks.
This paper proposes a Multi-Attention-Network (MANet) to address these issues.
A novel attention mechanism of kernel attention with linear complexity is proposed to alleviate the large computational demand in attention.
arXiv Detail & Related papers (2020-09-03T09:08:02Z) - AttentionNAS: Spatiotemporal Attention Cell Search for Video
Classification [86.64702967379709]
We propose a novel search space fortemporal attention cells, which allows the search algorithm to flexibly explore various design choices in the cell.
The discovered attention cells can be seamlessly inserted into existing backbone networks, e.g., I3D or S3D, and improve video accuracy by more than 2% on both Kinetics-600 and MiT datasets.
arXiv Detail & Related papers (2020-07-23T14:30:05Z) - DanHAR: Dual Attention Network For Multimodal Human Activity Recognition
Using Wearable Sensors [9.492607098644536]
We propose a novel dual attention method called DanHAR, which introduces the framework of blending channel attention and temporal attention on a CNN.
DanHAR achieves state-of-the-art performance with negligible overhead of parameters.
arXiv Detail & Related papers (2020-06-25T14:17:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.