Efficient Multi-Scale Attention Module with Cross-Spatial Learning
- URL: http://arxiv.org/abs/2305.13563v2
- Date: Tue, 6 Jun 2023 10:07:05 GMT
- Title: Efficient Multi-Scale Attention Module with Cross-Spatial Learning
- Authors: Daliang Ouyang, Su He, Guozhong Zhang, Mingzhu Luo, Huaiyong Guo, Jian
Zhan, Zhijie Huang
- Abstract summary: A novel efficient multi-scale attention (EMA) module is proposed.
We focus on retaining the information on per channel and decreasing the computational overhead.
We conduct extensive ablation studies and experiments on image classification and object detection tasks.
- Score: 4.046170185945849
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Remarkable effectiveness of the channel or spatial attention mechanisms for
producing more discernible feature representation are illustrated in various
computer vision tasks. However, modeling the cross-channel relationships with
channel dimensionality reduction may bring side effect in extracting deep
visual representations. In this paper, a novel efficient multi-scale attention
(EMA) module is proposed. Focusing on retaining the information on per channel
and decreasing the computational overhead, we reshape the partly channels into
the batch dimensions and group the channel dimensions into multiple
sub-features which make the spatial semantic features well-distributed inside
each feature group. Specifically, apart from encoding the global information to
re-calibrate the channel-wise weight in each parallel branch, the output
features of the two parallel branches are further aggregated by a
cross-dimension interaction for capturing pixel-level pairwise relationship. We
conduct extensive ablation studies and experiments on image classification and
object detection tasks with popular benchmarks (e.g., CIFAR-100, ImageNet-1k,
MS COCO and VisDrone2019) for evaluating its performance.
Related papers
- Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion [9.098711843118629]
This paper introduces state space model (SSM) and proposes a novel hybrid semantic segmentation network based on vision Mamba (CVMH-UNet)
This method designs a cross-scanning visual state space block (CVSSBlock) that uses cross 2D scanning (CS2D) to fully capture global information from multiple directions.
By incorporating convolutional neural network branches to overcome the constraints of Vision Mamba (VMamba) in acquiring local information, this approach facilitates a comprehensive analysis of both global and local features.
arXiv Detail & Related papers (2024-10-08T02:17:38Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - Joint Channel Estimation and Feedback with Masked Token Transformers in
Massive MIMO Systems [74.52117784544758]
This paper proposes an encoder-decoder based network that unveils the intrinsic frequency-domain correlation within the CSI matrix.
The entire encoder-decoder network is utilized for channel compression.
Our method outperforms state-of-the-art channel estimation and feedback techniques in joint tasks.
arXiv Detail & Related papers (2023-06-08T06:15:17Z) - CAT: Learning to Collaborate Channel and Spatial Attention from
Multi-Information Fusion [23.72040577828098]
We propose a plug-and-play attention module, which we term "CAT"-activating the Collaboration between spatial and channel Attentions.
Specifically, we represent traits as trainable coefficients (i.e., colla-factors) to adaptively combine contributions of different attention modules.
Our CAT outperforms existing state-of-the-art attention mechanisms in object detection, instance segmentation, and image classification.
arXiv Detail & Related papers (2022-12-13T02:34:10Z) - Semantic Labeling of High Resolution Images Using EfficientUNets and
Transformers [5.177947445379688]
We propose a new segmentation model that combines convolutional neural networks with deep transformers.
Our results demonstrate that the proposed methodology improves segmentation accuracy compared to state-of-the-art techniques.
arXiv Detail & Related papers (2022-06-20T12:03:54Z) - Channelized Axial Attention for Semantic Segmentation [70.14921019774793]
We propose the Channelized Axial Attention (CAA) to seamlessly integratechannel attention and axial attention with reduced computationalcomplexity.
Our CAA not onlyrequires much less computation resources compared with otherdual attention models such as DANet, but also outperforms the state-of-the-art ResNet-101-based segmentation models on alltested datasets.
arXiv Detail & Related papers (2021-01-19T03:08:03Z) - Dual Attention GANs for Semantic Image Synthesis [101.36015877815537]
We propose a novel Dual Attention GAN (DAGAN) to synthesize photo-realistic and semantically-consistent images.
We also propose two novel modules, i.e., position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention Module (CAM)
DAGAN achieves remarkably better results than state-of-the-art methods, while using fewer model parameters.
arXiv Detail & Related papers (2020-08-29T17:49:01Z) - Single Image Super-Resolution via a Holistic Attention Network [87.42409213909269]
We propose a new holistic attention network (HAN) to model the holistic interdependencies among layers, channels, and positions.
The proposed HAN adaptively emphasizes hierarchical features by considering correlations among layers.
Experiments demonstrate that the proposed HAN performs favorably against the state-of-the-art single image super-resolution approaches.
arXiv Detail & Related papers (2020-08-20T04:13:15Z) - Joint Self-Attention and Scale-Aggregation for Self-Calibrated Deraining
Network [13.628218953897946]
In this paper, we propose an effective algorithm, called JDNet, to solve the single image deraining problem.
By designing the Scale-Aggregation and Self-Attention modules with Self-Calibrated convolution skillfully, the proposed model has better deraining results.
arXiv Detail & Related papers (2020-08-06T17:04:34Z) - Sequential Hierarchical Learning with Distribution Transformation for
Image Super-Resolution [83.70890515772456]
We build a sequential hierarchical learning super-resolution network (SHSR) for effective image SR.
We consider the inter-scale correlations of features, and devise a sequential multi-scale block (SMB) to progressively explore the hierarchical information.
Experiment results show SHSR achieves superior quantitative performance and visual quality to state-of-the-art methods.
arXiv Detail & Related papers (2020-07-19T01:35:53Z) - Hybrid Multiple Attention Network for Semantic Segmentation in Aerial
Images [24.35779077001839]
We propose a novel attention-based framework named Hybrid Multiple Attention Network (HMANet) to adaptively capture global correlations.
We introduce a simple yet effective region shuffle attention (RSA) module to reduce feature redundant and improve the efficiency of self-attention mechanism.
arXiv Detail & Related papers (2020-01-09T07:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.