Multi-Scale Feature Aggregation by Cross-Scale Pixel-to-Region Relation
Operation for Semantic Segmentation
- URL: http://arxiv.org/abs/2106.01744v1
- Date: Thu, 3 Jun 2021 10:49:48 GMT
- Title: Multi-Scale Feature Aggregation by Cross-Scale Pixel-to-Region Relation
Operation for Semantic Segmentation
- Authors: Yechao Bai, Ziyuan Huang, Lyuyu Shen, Hongliang Guo, Marcelo H. Ang Jr
and Daniela Rus
- Abstract summary: We aim to enable the low-level feature to aggregate the complementary context from adjacent high-level feature maps by a cross-scale pixel-to-region operation.
We employ an efficient feature pyramid network to obtain multi-scale features.
Experiment results show that the RSP head performs competitively on both semantic segmentation and panoptic segmentation with high efficiency.
- Score: 44.792859259093085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exploiting multi-scale features has shown great potential in tackling
semantic segmentation problems. The aggregation is commonly done with sum or
concatenation (concat) followed by convolutional (conv) layers. However, it
fully passes down the high-level context to the following hierarchy without
considering their interrelation. In this work, we aim to enable the low-level
feature to aggregate the complementary context from adjacent high-level feature
maps by a cross-scale pixel-to-region relation operation. We leverage
cross-scale context propagation to make the long-range dependency capturable
even by the high-resolution low-level features. To this end, we employ an
efficient feature pyramid network to obtain multi-scale features. We propose a
Relational Semantics Extractor (RSE) and Relational Semantics Propagator (RSP)
for context extraction and propagation respectively. Then we stack several RSP
into an RSP head to achieve the progressive top-down distribution of the
context. Experiment results on two challenging datasets Cityscapes and COCO
demonstrate that the RSP head performs competitively on both semantic
segmentation and panoptic segmentation with high efficiency. It outperforms
DeeplabV3 [1] by 0.7% with 75% fewer FLOPs (multiply-adds) in the semantic
segmentation task.
Related papers
- Spatial Semantic Recurrent Mining for Referring Image Segmentation [63.34997546393106]
We propose Stextsuperscript2RM to achieve high-quality cross-modality fusion.
It follows a working strategy of trilogy: distributing language feature, spatial semantic recurrent coparsing, and parsed-semantic balancing.
Our proposed method performs favorably against other state-of-the-art algorithms.
arXiv Detail & Related papers (2024-05-15T00:17:48Z) - Multi-Content Interaction Network for Few-Shot Segmentation [37.80624074068096]
Few-Shot COCO is challenging for limited support images and large intra-class appearance discrepancies.
We propose a Multi-Content Interaction Network (MCINet) to remedy this issue.
MCINet improves FSS by incorporating the low-level structural information from another query branch into the high-level semantic features.
arXiv Detail & Related papers (2023-03-11T04:21:59Z) - CFNet: Learning Correlation Functions for One-Stage Panoptic
Segmentation [46.252118473248316]
We propose to first predict semantic-level and instance-level correlations among different locations that are utilized to enhance the backbone features.
We then feed the improved discriminative features into the corresponding segmentation heads, respectively.
We achieve state-of-the-art performance on MS with $45.1$% PQ and ADE20k with $32.6$% PQ.
arXiv Detail & Related papers (2022-01-13T05:31:14Z) - Learning to Aggregate Multi-Scale Context for Instance Segmentation in
Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process.
The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z) - HS3: Learning with Proper Task Complexity in Hierarchically Supervised
Semantic Segmentation [81.87943324048756]
We propose Hierarchically Supervised Semantic (HS3), a training scheme that supervises intermediate layers in a segmentation network to learn meaningful representations by varying task complexity.
Our proposed HS3-Fuse framework further improves segmentation predictions and achieves state-of-the-art results on two large segmentation benchmarks: NYUD-v2 and Cityscapes.
arXiv Detail & Related papers (2021-11-03T16:33:29Z) - A^2-FPN: Attention Aggregation based Feature Pyramid Network for
Instance Segmentation [68.10621089649486]
We propose Attention Aggregation based Feature Pyramid Network (A2-FPN) to improve multi-scale feature learning.
A2-FPN achieves an improvement of 2.0% and 1.4% mask AP when integrated into the strong baselines such as Cascade Mask R-CNN and Hybrid Task Cascade.
arXiv Detail & Related papers (2021-05-07T11:51:08Z) - A Holistically-Guided Decoder for Deep Representation Learning with
Applications to Semantic Segmentation and Object Detection [74.88284082187462]
One common strategy is to adopt dilated convolutions in the backbone networks to extract high-resolution feature maps.
We propose one novel holistically-guided decoder which is introduced to obtain the high-resolution semantic-rich feature maps.
arXiv Detail & Related papers (2020-12-18T10:51:49Z) - Sequential Hierarchical Learning with Distribution Transformation for
Image Super-Resolution [83.70890515772456]
We build a sequential hierarchical learning super-resolution network (SHSR) for effective image SR.
We consider the inter-scale correlations of features, and devise a sequential multi-scale block (SMB) to progressively explore the hierarchical information.
Experiment results show SHSR achieves superior quantitative performance and visual quality to state-of-the-art methods.
arXiv Detail & Related papers (2020-07-19T01:35:53Z) - Associating Multi-Scale Receptive Fields for Fine-grained Recognition [5.079292308180334]
We propose a novel cross-layer non-local (CNL) module to associate multi-scale receptive fields by two operations.
CNL computes correlations between features of a query layer and all response layers.
Our model builds spatial dependencies among multi-level layers and learns more discriminative features.
arXiv Detail & Related papers (2020-05-19T01:16:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.