EPSANet: An Efficient Pyramid Split Attention Block on Convolutional
Neural Network
- URL: http://arxiv.org/abs/2105.14447v1
- Date: Sun, 30 May 2021 07:26:41 GMT
- Title: EPSANet: An Efficient Pyramid Split Attention Block on Convolutional
Neural Network
- Authors: Hu Zhang and Keke Zu and Jian Lu and Yuru Zou and Deyu Meng
- Abstract summary: In this work, a novel lightweight and effective attention method named Pyramid Split Attention (PSA) module is proposed.
By replacing the 3x3 convolution with the PSA module in the bottleneck blocks of the ResNet, a novel representational block named Efficient Pyramid Split Attention (EPSA) is obtained.
The EPSA block can be easily added as a plug-and-play component into a well-established backbone network, and significant improvements on model performance can be achieved.
- Score: 41.994043409345956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, it has been demonstrated that the performance of a deep
convolutional neural network can be effectively improved by embedding an
attention module into it. In this work, a novel lightweight and effective
attention method named Pyramid Split Attention (PSA) module is proposed. By
replacing the 3x3 convolution with the PSA module in the bottleneck blocks of
the ResNet, a novel representational block named Efficient Pyramid Split
Attention (EPSA) is obtained. The EPSA block can be easily added as a
plug-and-play component into a well-established backbone network, and
significant improvements on model performance can be achieved. Hence, a simple
and efficient backbone architecture named EPSANet is developed in this work by
stacking these ResNet-style EPSA blocks. Correspondingly, a stronger
multi-scale representation ability can be offered by the proposed EPSANet for
various computer vision tasks including but not limited to, image
classification, object detection, instance segmentation, etc. Without bells and
whistles, the performance of the proposed EPSANet outperforms most of the
state-of-the-art channel attention methods. As compared to the SENet-50, the
Top-1 accuracy is improved by 1.93 % on ImageNet dataset, a larger margin of
+2.7 box AP for object detection and an improvement of +1.7 mask AP for
instance segmentation by using the Mask-RCNN on MS-COCO dataset are obtained.
Our source code is available at:https://github.com/murufeng/EPSANet.
Related papers
- PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - PARFormer: Transformer-based Multi-Task Network for Pedestrian Attribute
Recognition [23.814762073093153]
We propose a pure transformer-based multi-task PAR network named PARFormer, which includes four modules.
In the feature extraction module, we build a strong baseline for feature extraction, which achieves competitive results on several PAR benchmarks.
In the viewpoint perception module, we explore the impact of viewpoints on pedestrian attributes, and propose a multi-view contrastive loss.
In the attribute recognition module, we alleviate the negative-positive imbalance problem to generate the attribute predictions.
arXiv Detail & Related papers (2023-04-14T16:27:56Z) - A Tri-Layer Plugin to Improve Occluded Detection [100.99802831241583]
We propose a simple '' module for the detection head of two-stage object detectors to improve the recall of partially occluded objects.
The module predicts a tri-layer of segmentation masks for the target object, the occluder and the occludee, and by doing so is able to better predict the mask of the target object.
We also establish a COCO evaluation dataset to measure the recall performance of partially occluded and separated objects.
arXiv Detail & Related papers (2022-10-18T17:59:51Z) - An efficient encoder-decoder architecture with top-down attention for
speech separation [25.092542427133704]
We provide a bio-inspired efficient encoder-decoder architecture by mimicking the brain's top-down attention, called TDANet.
On three benchmark datasets, TDANet consistently achieved competitive separation performance to previous state-of-the-art (SOTA) methods.
arXiv Detail & Related papers (2022-09-30T03:09:53Z) - A Mask Attention Interaction and Scale Enhancement Network for SAR Ship
Instance Segmentation [4.232332676611087]
We propose a mask attention interaction and scale enhancement network (MAI-SE-Net) for SAR ship instance segmentation.
MAI uses an atrous spatial pyra-mid pooling (ASPP) to gain multi-resolution feature re-sponses, a non-local block (NLB) to model long-range spa-tial dependencies, and a concatenation shuffle attention block (CSAB) to improve interaction benefits.
arXiv Detail & Related papers (2022-07-08T14:04:04Z) - a novel attention-based network for fast salient object detection [14.246237737452105]
In the current salient object detection network, the most popular method is using U-shape structure.
We propose a new deep convolution network architecture with three contributions.
Results demonstrate that the proposed method can compress the model to 1/3 of the original size nearly without losing the accuracy.
arXiv Detail & Related papers (2021-12-20T12:30:20Z) - A^2-FPN: Attention Aggregation based Feature Pyramid Network for
Instance Segmentation [68.10621089649486]
We propose Attention Aggregation based Feature Pyramid Network (A2-FPN) to improve multi-scale feature learning.
A2-FPN achieves an improvement of 2.0% and 1.4% mask AP when integrated into the strong baselines such as Cascade Mask R-CNN and Hybrid Task Cascade.
arXiv Detail & Related papers (2021-05-07T11:51:08Z) - CARAFE++: Unified Content-Aware ReAssembly of FEatures [132.49582482421246]
We propose unified Content-Aware ReAssembly of FEatures (CARAFE++), a universal, lightweight and highly effective operator to fulfill this goal.
CARAFE++ generates adaptive kernels on-the-fly to enable instance-specific content-aware handling.
It shows consistent and substantial gains across all the tasks with negligible computational overhead.
arXiv Detail & Related papers (2020-12-07T07:34:57Z) - Regularized Densely-connected Pyramid Network for Salient Instance
Segmentation [73.17802158095813]
We propose a new pipeline for end-to-end salient instance segmentation (SIS)
To better use the rich feature hierarchies in deep networks, we propose the regularized dense connections.
A novel multi-level RoIAlign based decoder is introduced to adaptively aggregate multi-level features for better mask predictions.
arXiv Detail & Related papers (2020-08-28T00:13:30Z) - A novel Region of Interest Extraction Layer for Instance Segmentation [3.5493798890908104]
This paper is motivated by the need to overcome the limitations of existing RoI extractors.
The proposed layer (called Generic RoI Extractor - GRoIE) introduces non-local building blocks and attention mechanisms to boost the performance.
GRoIE can be integrated seamlessly with every two-stage architecture for both object detection and instance segmentation tasks.
arXiv Detail & Related papers (2020-04-28T17:07:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.