GSTO: Gated Scale-Transfer Operation for Multi-Scale Feature Learning in
Pixel Labeling
- URL: http://arxiv.org/abs/2005.13363v2
- Date: Sun, 28 Jun 2020 13:51:04 GMT
- Title: GSTO: Gated Scale-Transfer Operation for Multi-Scale Feature Learning in
Pixel Labeling
- Authors: Zhuoying Wang and Yongtao Wang and Zhi Tang and Yangyan Li and Ying
Chen and Haibin Ling and Weisi Lin
- Abstract summary: We propose the Gated Scale-Transfer Operation (GSTO) to properly transit spatial-supervised features to another scale.
By plugging GSTO into HRNet, we get a more powerful backbone for pixel labeling.
Experiment results demonstrate that GSTO can also significantly boost the performance of multi-scale feature aggregation modules.
- Score: 92.90448357454274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing CNN-based methods for pixel labeling heavily depend on multi-scale
features to meet the requirements of both semantic comprehension and detail
preservation. State-of-the-art pixel labeling neural networks widely exploit
conventional scale-transfer operations, i.e., up-sampling and down-sampling to
learn multi-scale features. In this work, we find that these operations lead to
scale-confused features and suboptimal performance because they are
spatial-invariant and directly transit all feature information cross scales
without spatial selection. To address this issue, we propose the Gated
Scale-Transfer Operation (GSTO) to properly transit spatial-filtered features
to another scale. Specifically, GSTO can work either with or without extra
supervision. Unsupervised GSTO is learned from the feature itself while the
supervised one is guided by the supervised probability matrix. Both forms of
GSTO are lightweight and plug-and-play, which can be flexibly integrated into
networks or modules for learning better multi-scale features. In particular, by
plugging GSTO into HRNet, we get a more powerful backbone (namely GSTO-HRNet)
for pixel labeling, and it achieves new state-of-the-art results on the COCO
benchmark for human pose estimation and other benchmarks for semantic
segmentation including Cityscapes, LIP and Pascal Context, with negligible
extra computational cost. Moreover, experiment results demonstrate that GSTO
can also significantly boost the performance of multi-scale feature aggregation
modules like PPM and ASPP. Code will be made available at
https://github.com/VDIGPKU/GSTO.
Related papers
- ComPtr: Towards Diverse Bi-source Dense Prediction Tasks via A Simple
yet General Complementary Transformer [91.43066633305662]
We propose a novel underlineComPlementary underlinetransformer, textbfComPtr, for diverse bi-source dense prediction tasks.
ComPtr treats different inputs equally and builds an efficient dense interaction model in the form of sequence-to-sequence on top of the transformer.
arXiv Detail & Related papers (2023-07-23T15:17:45Z) - Feature Aggregation and Propagation Network for Camouflaged Object
Detection [42.33180748293329]
Camouflaged object detection (COD) aims to detect/segment camouflaged objects embedded in the environment.
Several COD methods have been developed, but they still suffer from unsatisfactory performance due to intrinsic similarities between foreground objects and background surroundings.
We propose a novel Feature Aggregation and propagation Network (FAP-Net) for camouflaged object detection.
arXiv Detail & Related papers (2022-12-02T05:54:28Z) - Transformer Scale Gate for Semantic Segmentation [53.27673119360868]
Transformer Scale Gate (TSG) exploits cues in self and cross attentions in Vision Transformers for the scale selection.
Our experiments on the Pascal Context and ADE20K datasets demonstrate that our feature selection strategy achieves consistent gains.
arXiv Detail & Related papers (2022-05-14T13:11:39Z) - Multi-scale and Cross-scale Contrastive Learning for Semantic
Segmentation [5.281694565226513]
We apply contrastive learning to enhance the discriminative power of the multi-scale features extracted by semantic segmentation networks.
By first mapping the encoder's multi-scale representations to a common feature space, we instantiate a novel form of supervised local-global constraint.
arXiv Detail & Related papers (2022-03-25T01:24:24Z) - Learning to Aggregate Multi-Scale Context for Instance Segmentation in
Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process.
The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - GhostSR: Learning Ghost Features for Efficient Image Super-Resolution [49.393251361038025]
Single image super-resolution (SISR) system based on convolutional neural networks (CNNs) achieves fancy performance while requires huge computational costs.
We propose to use shift operation to generate the redundant features (i.e., Ghost features) of SISR models.
We show that both the non-compact and lightweight SISR models embedded in our proposed module can achieve comparable performance to that of their baselines.
arXiv Detail & Related papers (2021-01-21T10:09:47Z) - Sequential Hierarchical Learning with Distribution Transformation for
Image Super-Resolution [83.70890515772456]
We build a sequential hierarchical learning super-resolution network (SHSR) for effective image SR.
We consider the inter-scale correlations of features, and devise a sequential multi-scale block (SMB) to progressively explore the hierarchical information.
Experiment results show SHSR achieves superior quantitative performance and visual quality to state-of-the-art methods.
arXiv Detail & Related papers (2020-07-19T01:35:53Z) - Distance Guided Channel Weighting for Semantic Segmentation [4.10724123131976]
We introduce Distance Guided Channel Weighting Module (DGCW)
The DGCW module is constructed in a pixel-wise context extraction manner.
We propose the Distance Guided Channel Weighting Network (DGCWNet)
arXiv Detail & Related papers (2020-04-27T09:57:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.