A^2-FPN: Attention Aggregation based Feature Pyramid Network for
Instance Segmentation
- URL: http://arxiv.org/abs/2105.03186v1
- Date: Fri, 7 May 2021 11:51:08 GMT
- Title: A^2-FPN: Attention Aggregation based Feature Pyramid Network for
Instance Segmentation
- Authors: Miao Hu and Yali Li and Lu Fang and Shengjin Wang
- Abstract summary: We propose Attention Aggregation based Feature Pyramid Network (A2-FPN) to improve multi-scale feature learning.
A2-FPN achieves an improvement of 2.0% and 1.4% mask AP when integrated into the strong baselines such as Cascade Mask R-CNN and Hybrid Task Cascade.
- Score: 68.10621089649486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning pyramidal feature representations is crucial for recognizing object
instances at different scales. Feature Pyramid Network (FPN) is the classic
architecture to build a feature pyramid with high-level semantics throughout.
However, intrinsic defects in feature extraction and fusion inhibit FPN from
further aggregating more discriminative features. In this work, we propose
Attention Aggregation based Feature Pyramid Network (A^2-FPN), to improve
multi-scale feature learning through attention-guided feature aggregation. In
feature extraction, it extracts discriminative features by
collecting-distributing multi-level global context features, and mitigates the
semantic information loss due to drastically reduced channels. In feature
fusion, it aggregates complementary information from adjacent features to
generate location-wise reassembly kernels for content-aware sampling, and
employs channel-wise reweighting to enhance the semantic consistency before
element-wise addition. A^2-FPN shows consistent gains on different instance
segmentation frameworks. By replacing FPN with A^2-FPN in Mask R-CNN, our model
boosts the performance by 2.1% and 1.6% mask AP when using ResNet-50 and
ResNet-101 as backbone, respectively. Moreover, A^2-FPN achieves an improvement
of 2.0% and 1.4% mask AP when integrated into the strong baselines such as
Cascade Mask R-CNN and Hybrid Task Cascade.
Related papers
- Retro-FPN: Retrospective Feature Pyramid Network for Point Cloud
Semantic Segmentation [65.78483246139888]
We propose Retro-FPN to model the per-point feature prediction as an explicit and retrospective refining process.
Its key novelty is a retro-transformer for summarizing semantic contexts from the previous layer.
We show that Retro-FPN can significantly improve performance over state-of-the-art backbones.
arXiv Detail & Related papers (2023-08-18T05:28:25Z) - PARFormer: Transformer-based Multi-Task Network for Pedestrian Attribute
Recognition [23.814762073093153]
We propose a pure transformer-based multi-task PAR network named PARFormer, which includes four modules.
In the feature extraction module, we build a strong baseline for feature extraction, which achieves competitive results on several PAR benchmarks.
In the viewpoint perception module, we explore the impact of viewpoints on pedestrian attributes, and propose a multi-view contrastive loss.
In the attribute recognition module, we alleviate the negative-positive imbalance problem to generate the attribute predictions.
arXiv Detail & Related papers (2023-04-14T16:27:56Z) - Semantic Feature Integration network for Fine-grained Visual
Classification [5.182627302449368]
We propose the Semantic Feature Integration network (SFI-Net) to address the above difficulties.
By eliminating unnecessary features and reconstructing the semantic relations among discriminative features, our SFI-Net has achieved satisfying performance.
arXiv Detail & Related papers (2023-02-13T07:32:25Z) - RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object
Detection [10.847953426161924]
We propose RCNet, which consists of Reverse Feature Pyramid (RevFP) and Cross-scale Shift Network (CSN)
RevFP utilizes local bidirectional feature fusion to simplify the bidirectional pyramid inference pipeline.
CSN directly propagates representations to both adjacent and non-adjacent levels to enable multi-scale features more correlative.
arXiv Detail & Related papers (2021-10-23T04:00:25Z) - FaPN: Feature-aligned Pyramid Network for Dense Image Prediction [6.613724825924151]
We propose a feature alignment module that learns transformation offsets of pixels to contextually align upsampled features.
We then integrate these two modules in a top-down pyramidal architecture and present the Feature-aligned Pyramid Network (FaPN)
In particular, our FaPN achieves the state-of-the-art of 56.7% mIoU on ADE20K when integrated within Mask-Former.
arXiv Detail & Related papers (2021-08-16T12:52:42Z) - CARAFE++: Unified Content-Aware ReAssembly of FEatures [132.49582482421246]
We propose unified Content-Aware ReAssembly of FEatures (CARAFE++), a universal, lightweight and highly effective operator to fulfill this goal.
CARAFE++ generates adaptive kernels on-the-fly to enable instance-specific content-aware handling.
It shows consistent and substantial gains across all the tasks with negligible computational overhead.
arXiv Detail & Related papers (2020-12-07T07:34:57Z) - $P^2$ Net: Augmented Parallel-Pyramid Net for Attention Guided Pose
Estimation [69.25492391672064]
We propose an augmented Parallel-Pyramid Net with feature refinement by dilated bottleneck and attention module.
A parallel-pyramid structure is followed to compensate the information loss introduced by the network.
Our method achieves the best performance on the challenging MSCOCO and MPII datasets.
arXiv Detail & Related papers (2020-10-26T02:10:12Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - A novel Region of Interest Extraction Layer for Instance Segmentation [3.5493798890908104]
This paper is motivated by the need to overcome the limitations of existing RoI extractors.
The proposed layer (called Generic RoI Extractor - GRoIE) introduces non-local building blocks and attention mechanisms to boost the performance.
GRoIE can be integrated seamlessly with every two-stage architecture for both object detection and instance segmentation tasks.
arXiv Detail & Related papers (2020-04-28T17:07:32Z) - When Residual Learning Meets Dense Aggregation: Rethinking the
Aggregation of Deep Neural Networks [57.0502745301132]
We propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations.
Our micro-dense block can be integrated with neural architecture search based models to boost their performance.
arXiv Detail & Related papers (2020-04-19T08:34:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.