Channelized Axial Attention for Semantic Segmentation
- URL: http://arxiv.org/abs/2101.07434v2
- Date: Wed, 17 Mar 2021 16:24:11 GMT
- Title: Channelized Axial Attention for Semantic Segmentation
- Authors: Ye Huang, Wenjing Jia, Xiangjian He, Liu Liu, Yuxin Li, Dacheng Tao
- Abstract summary: We propose the Channelized Axial Attention (CAA) to seamlessly integratechannel attention and axial attention with reduced computationalcomplexity.
Our CAA not onlyrequires much less computation resources compared with otherdual attention models such as DANet, but also outperforms the state-of-the-art ResNet-101-based segmentation models on alltested datasets.
- Score: 70.14921019774793
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-attention and channel attention, modelling thesemantic interdependencies
in spatial and channel dimensionsrespectively, have recently been widely used
for semantic seg-mentation. However, computing spatial-attention and
channelattention separately and then fusing them directly can causeconflicting
feature representations. In this paper, we proposethe Channelized Axial
Attention (CAA) to seamlessly integratechannel attention and axial attention
with reduced computationalcomplexity. After computing axial attention maps, we
propose tochannelize the intermediate results obtained from the
transposeddot-product so that the channel importance of each axial
repre-sentation is optimized across the whole receptive field. We
furtherdevelop grouped vectorization, which allows our model to be runwith very
little memory consumption at a speed comparableto the full vectorization.
Comparative experiments conductedon multiple benchmark datasets, including
Cityscapes, PASCALContext and COCO-Stuff, demonstrate that our CAA not
onlyrequires much less computation resources compared with otherdual attention
models such as DANet, but also outperformsthe state-of-the-art ResNet-101-based
segmentation models on alltested datasets.
Related papers
- CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks [19.468704622654357]
We present a channel-wise spatially autocorrelated (CSA) attention mechanism for deep CNNs.
Inspired by geographical analysis, the proposed CSA exploits the spatial relationships between channels of feature maps to produce an effective channel descriptor.
We validate the effectiveness of the proposed CSA networks through extensive experiments and analysis on ImageNet, and MS COCO benchmark datasets.
arXiv Detail & Related papers (2024-05-09T13:21:03Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Efficient Multi-Scale Attention Module with Cross-Spatial Learning [4.046170185945849]
A novel efficient multi-scale attention (EMA) module is proposed.
We focus on retaining the information on per channel and decreasing the computational overhead.
We conduct extensive ablation studies and experiments on image classification and object detection tasks.
arXiv Detail & Related papers (2023-05-23T00:35:47Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - Attention in Attention: Modeling Context Correlation for Efficient Video
Classification [47.938500236792244]
This paper proposes an efficient attention-in-attention (AIA) method for focus-wise feature refinement.
We instantiate video feature contexts as dynamics aggregated along a specific axis with global average and pooling operations.
All the computational operations in attention units act on the pooled dimension, which results in quite few computational cost increase.
arXiv Detail & Related papers (2022-04-20T08:37:52Z) - Fully Attentional Network for Semantic Segmentation [17.24768249911501]
We propose Fully Attentional Network (FLANet) to encode both spatial and channel attentions in a single similarity map.
Our new method has achieved state-of-the-art performance on three challenging semantic segmentation datasets.
arXiv Detail & Related papers (2021-12-08T04:34:55Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Channel Pruning Guided by Spatial and Channel Attention for DNNs in
Intelligent Edge Computing [15.248962858090431]
A critical challenge is to determine which channels are to be removed, so that the model accuracy will not be negatively affected.
We propose a new attention module combining both spatial and channel attention.
With the guidance of SCA, our CPSCA approach achieves higher inference accuracy than other state-of-the-art pruning methods.
arXiv Detail & Related papers (2020-11-08T02:40:06Z) - Adaptive feature recombination and recalibration for semantic
segmentation with Fully Convolutional Networks [57.64866581615309]
We propose recombination of features and a spatially adaptive recalibration block that is adapted for semantic segmentation with Fully Convolutional Networks.
Results indicate that Recombination and Recalibration improve the results of a competitive baseline, and generalize across three different problems.
arXiv Detail & Related papers (2020-06-19T15:45:03Z) - Hybrid Multiple Attention Network for Semantic Segmentation in Aerial
Images [24.35779077001839]
We propose a novel attention-based framework named Hybrid Multiple Attention Network (HMANet) to adaptively capture global correlations.
We introduce a simple yet effective region shuffle attention (RSA) module to reduce feature redundant and improve the efficiency of self-attention mechanism.
arXiv Detail & Related papers (2020-01-09T07:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.