Rotate to Attend: Convolutional Triplet Attention Module
- URL: http://arxiv.org/abs/2010.03045v2
- Date: Thu, 5 Nov 2020 19:08:52 GMT
- Title: Rotate to Attend: Convolutional Triplet Attention Module
- Authors: Diganta Misra, Trikay Nalamada, Ajay Uppili Arasanipalai, Qibin Hou
- Abstract summary: We present triplet attention, a novel method for computing attention weights using a three-branch structure.
Our method is simple as well as efficient and can be easily plugged into classic backbone networks as an add-on module.
We demonstrate the effectiveness of our method on various challenging tasks including image classification on ImageNet-1k and object detection on MSCOCO and PASCAL VOC datasets.
- Score: 21.228370317693244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Benefiting from the capability of building inter-dependencies among channels
or spatial locations, attention mechanisms have been extensively studied and
broadly used in a variety of computer vision tasks recently. In this paper, we
investigate light-weight but effective attention mechanisms and present triplet
attention, a novel method for computing attention weights by capturing
cross-dimension interaction using a three-branch structure. For an input
tensor, triplet attention builds inter-dimensional dependencies by the rotation
operation followed by residual transformations and encodes inter-channel and
spatial information with negligible computational overhead. Our method is
simple as well as efficient and can be easily plugged into classic backbone
networks as an add-on module. We demonstrate the effectiveness of our method on
various challenging tasks including image classification on ImageNet-1k and
object detection on MSCOCO and PASCAL VOC datasets. Furthermore, we provide
extensive in-sight into the performance of triplet attention by visually
inspecting the GradCAM and GradCAM++ results. The empirical evaluation of our
method supports our intuition on the importance of capturing dependencies
across dimensions when computing attention weights. Code for this paper can be
publicly accessed at https://github.com/LandskapeAI/triplet-attention
Related papers
- PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - ELA: Efficient Local Attention for Deep Convolutional Neural Networks [15.976475674061287]
This paper introduces an Efficient Local Attention (ELA) method that achieves substantial performance improvements with a simple structure.
To overcome these challenges, we propose the incorporation of 1D convolution and Group Normalization feature enhancement techniques.
ELA can be seamlessly integrated into deep CNN networks such as ResNet, MobileNet, and DeepLab.
arXiv Detail & Related papers (2024-03-02T08:06:18Z) - Efficient Multi-Scale Attention Module with Cross-Spatial Learning [4.046170185945849]
A novel efficient multi-scale attention (EMA) module is proposed.
We focus on retaining the information on per channel and decreasing the computational overhead.
We conduct extensive ablation studies and experiments on image classification and object detection tasks.
arXiv Detail & Related papers (2023-05-23T00:35:47Z) - CAT: Learning to Collaborate Channel and Spatial Attention from
Multi-Information Fusion [23.72040577828098]
We propose a plug-and-play attention module, which we term "CAT"-activating the Collaboration between spatial and channel Attentions.
Specifically, we represent traits as trainable coefficients (i.e., colla-factors) to adaptively combine contributions of different attention modules.
Our CAT outperforms existing state-of-the-art attention mechanisms in object detection, instance segmentation, and image classification.
arXiv Detail & Related papers (2022-12-13T02:34:10Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - Coordinate Attention for Efficient Mobile Network Design [96.40415345942186]
We propose a novel attention mechanism for mobile networks by embedding positional information into channel attention.
Unlike channel attention that transforms a feature tensor to a single feature vector via 2D global pooling, the coordinate attention factorizes channel attention into two 1D feature encoding processes.
Our coordinate attention is beneficial to ImageNet classification and behaves better in down-stream tasks, such as object detection and semantic segmentation.
arXiv Detail & Related papers (2021-03-04T09:18:02Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z) - Attention Cube Network for Image Restoration [39.49175636499541]
We propose an attention cube network (A-CubeNet) for image restoration for more powerful feature expression and feature correlation learning.
We design a novel attention mechanism from three dimensions, namely spatial dimension, channel-wise dimension and hierarchical dimension.
Experiments demonstrate the superiority of our method over state-of-the-art image restoration methods in both quantitative comparison and visual analysis.
arXiv Detail & Related papers (2020-09-13T03:42:14Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - Hybrid Multiple Attention Network for Semantic Segmentation in Aerial
Images [24.35779077001839]
We propose a novel attention-based framework named Hybrid Multiple Attention Network (HMANet) to adaptively capture global correlations.
We introduce a simple yet effective region shuffle attention (RSA) module to reduce feature redundant and improve the efficiency of self-attention mechanism.
arXiv Detail & Related papers (2020-01-09T07:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.