Related papers: EPSANet: An Efficient Pyramid Split Attention Block on Convolutional Neural Network

EPSANet: An Efficient Pyramid Split Attention Block on Convolutional Neural Network

URL: http://arxiv.org/abs/2105.14447v1
Date: Sun, 30 May 2021 07:26:41 GMT
Title: EPSANet: An Efficient Pyramid Split Attention Block on Convolutional Neural Network
Authors: Hu Zhang and Keke Zu and Jian Lu and Yuru Zou and Deyu Meng
Abstract summary: In this work, a novel lightweight and effective attention method named Pyramid Split Attention (PSA) module is proposed. By replacing the 3x3 convolution with the PSA module in the bottleneck blocks of the ResNet, a novel representational block named Efficient Pyramid Split Attention (EPSA) is obtained. The EPSA block can be easily added as a plug-and-play component into a well-established backbone network, and significant improvements on model performance can be achieved.
Score: 41.994043409345956
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, it has been demonstrated that the performance of a deep convolutional neural network can be effectively improved by embedding an attention module into it. In this work, a novel lightweight and effective attention method named Pyramid Split Attention (PSA) module is proposed. By replacing the 3x3 convolution with the PSA module in the bottleneck blocks of the ResNet, a novel representational block named Efficient Pyramid Split Attention (EPSA) is obtained. The EPSA block can be easily added as a plug-and-play component into a well-established backbone network, and significant improvements on model performance can be achieved. Hence, a simple and efficient backbone architecture named EPSANet is developed in this work by stacking these ResNet-style EPSA blocks. Correspondingly, a stronger multi-scale representation ability can be offered by the proposed EPSANet for various computer vision tasks including but not limited to, image classification, object detection, instance segmentation, etc. Without bells and whistles, the performance of the proposed EPSANet outperforms most of the state-of-the-art channel attention methods. As compared to the SENet-50, the Top-1 accuracy is improved by 1.93 % on ImageNet dataset, a larger margin of +2.7 box AP for object detection and an improvement of +1.7 mask AP for instance segmentation by using the Mask-RCNN on MS-COCO dataset are obtained. Our source code is available at:https://github.com/murufeng/EPSANet.

Related papers

RPCANet++: Deep Interpretable Robust PCA for Sparse Object Segmentation [51.37553739930992]
RPCANet++ is a sparse object segmentation framework that fuses the interpretability of RPCA with efficient deep architectures.<n>Our approach unfolds a relaxed RPCA model into a structured network comprising a Background Approximation Module (BAM), an Object Extraction Module (OEM) and an Image Restoration Module (IRM)<n>Experiments on diverse datasets demonstrate that RPCANet++ achieves state-of-the-art performance under various imaging scenarios.
arXiv Detail & Related papers (2025-08-06T08:19:37Z)
RRCANet: Recurrent Reusable-Convolution Attention Network for Infrared Small Target Detection [23.54800619558163]
Infrared small target detection is a challenging task due to its unique characteristics.<n>Recent CNN-based methods have achieved promising performance with heavy feature extraction and fusion modules.<n>We propose a recurrent reusable-convolution attention network (RRCA-Net) for infrared small target detection.
arXiv Detail & Related papers (2025-06-03T03:18:17Z)
PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN) PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z)
PARFormer: Transformer-based Multi-Task Network for Pedestrian Attribute Recognition [23.814762073093153]
We propose a pure transformer-based multi-task PAR network named PARFormer, which includes four modules. In the feature extraction module, we build a strong baseline for feature extraction, which achieves competitive results on several PAR benchmarks. In the viewpoint perception module, we explore the impact of viewpoints on pedestrian attributes, and propose a multi-view contrastive loss. In the attribute recognition module, we alleviate the negative-positive imbalance problem to generate the attribute predictions.
arXiv Detail & Related papers (2023-04-14T16:27:56Z)
A Tri-Layer Plugin to Improve Occluded Detection [100.99802831241583]
We propose a simple '' module for the detection head of two-stage object detectors to improve the recall of partially occluded objects. The module predicts a tri-layer of segmentation masks for the target object, the occluder and the occludee, and by doing so is able to better predict the mask of the target object. We also establish a COCO evaluation dataset to measure the recall performance of partially occluded and separated objects.
arXiv Detail & Related papers (2022-10-18T17:59:51Z)
An efficient encoder-decoder architecture with top-down attention for speech separation [25.092542427133704]
We provide a bio-inspired efficient encoder-decoder architecture by mimicking the brain's top-down attention, called TDANet. On three benchmark datasets, TDANet consistently achieved competitive separation performance to previous state-of-the-art (SOTA) methods.
arXiv Detail & Related papers (2022-09-30T03:09:53Z)
A Mask Attention Interaction and Scale Enhancement Network for SAR Ship Instance Segmentation [4.232332676611087]
We propose a mask attention interaction and scale enhancement network (MAI-SE-Net) for SAR ship instance segmentation. MAI uses an atrous spatial pyra-mid pooling (ASPP) to gain multi-resolution feature re-sponses, a non-local block (NLB) to model long-range spa-tial dependencies, and a concatenation shuffle attention block (CSAB) to improve interaction benefits.
arXiv Detail & Related papers (2022-07-08T14:04:04Z)
a novel attention-based network for fast salient object detection [14.246237737452105]
In the current salient object detection network, the most popular method is using U-shape structure. We propose a new deep convolution network architecture with three contributions. Results demonstrate that the proposed method can compress the model to 1/3 of the original size nearly without losing the accuracy.
arXiv Detail & Related papers (2021-12-20T12:30:20Z)
A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation [68.10621089649486]
We propose Attention Aggregation based Feature Pyramid Network (A2-FPN) to improve multi-scale feature learning. A2-FPN achieves an improvement of 2.0% and 1.4% mask AP when integrated into the strong baselines such as Cascade Mask R-CNN and Hybrid Task Cascade.
arXiv Detail & Related papers (2021-05-07T11:51:08Z)
CARAFE++: Unified Content-Aware ReAssembly of FEatures [132.49582482421246]
We propose unified Content-Aware ReAssembly of FEatures (CARAFE++), a universal, lightweight and highly effective operator to fulfill this goal. CARAFE++ generates adaptive kernels on-the-fly to enable instance-specific content-aware handling. It shows consistent and substantial gains across all the tasks with negligible computational overhead.
arXiv Detail & Related papers (2020-12-07T07:34:57Z)
Regularized Densely-connected Pyramid Network for Salient Instance Segmentation [73.17802158095813]
We propose a new pipeline for end-to-end salient instance segmentation (SIS) To better use the rich feature hierarchies in deep networks, we propose the regularized dense connections. A novel multi-level RoIAlign based decoder is introduced to adaptively aggregate multi-level features for better mask predictions.
arXiv Detail & Related papers (2020-08-28T00:13:30Z)
A novel Region of Interest Extraction Layer for Instance Segmentation [3.5493798890908104]
This paper is motivated by the need to overcome the limitations of existing RoI extractors. The proposed layer (called Generic RoI Extractor - GRoIE) introduces non-local building blocks and attention mechanisms to boost the performance. GRoIE can be integrated seamlessly with every two-stage architecture for both object detection and instance segmentation tasks.
arXiv Detail & Related papers (2020-04-28T17:07:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.