Pyramid Fusion Transformer for Semantic Segmentation
- URL: http://arxiv.org/abs/2201.04019v4
- Date: Tue, 30 May 2023 10:27:46 GMT
- Title: Pyramid Fusion Transformer for Semantic Segmentation
- Authors: Zipeng Qin, Jianbo Liu, Xiaolin Zhang, Maoqing Tian, Aojun Zhou, Shuai
Yi, Hongsheng Li
- Abstract summary: We propose a transformer-based Pyramid Fusion Transformer (PFT) for per-mask approach semantic segmentation with multi-scale features.
We achieve competitive performance on three widely used semantic segmentation datasets.
- Score: 44.57867861592341
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recently proposed MaskFormer gives a refreshed perspective on the task of
semantic segmentation: it shifts from the popular pixel-level classification
paradigm to a mask-level classification method. In essence, it generates paired
probabilities and masks corresponding to category segments and combines them
during inference for the segmentation maps. In our study, we find that per-mask
classification decoder on top of a single-scale feature is not effective enough
to extract reliable probability or mask. To mine for rich semantic information
across the feature pyramid, we propose a transformer-based Pyramid Fusion
Transformer (PFT) for per-mask approach semantic segmentation with multi-scale
features. The proposed transformer decoder performs cross-attention between the
learnable queries and each spatial feature from the feature pyramid in parallel
and uses cross-scale inter-query attention to exchange complimentary
information. We achieve competitive performance on three widely used semantic
segmentation datasets. In particular, on ADE20K validation set, our result with
Swin-B backbone surpasses that of MaskFormer's with a much larger Swin-L
backbone in both single-scale and multi-scale inference, achieving 54.1 mIoU
and 55.7 mIoU respectively. Using a Swin-L backbone, we achieve single-scale
56.1 mIoU and multi-scale 57.4 mIoU, obtaining state-of-the-art performance on
the dataset. Extensive experiments on three widely used semantic segmentation
datasets verify the effectiveness of our proposed method.
Related papers
- Pyramid Hierarchical Transformer for Hyperspectral Image Classification [1.9427851979929982]
We propose a pyramid-based hierarchical transformer (PyFormer)
This innovative approach organizes input data hierarchically into segments, each representing distinct abstraction levels.
Results underscore the superiority of the proposed method over traditional approaches.
arXiv Detail & Related papers (2024-04-23T11:41:19Z) - HGFormer: Hierarchical Grouping Transformer for Domain Generalized
Semantic Segmentation [113.6560373226501]
This work studies semantic segmentation under the domain generalization setting.
We propose a novel hierarchical grouping transformer (HGFormer) to explicitly group pixels to form part-level masks and then whole-level masks.
Experiments show that HGFormer yields more robust semantic segmentation results than per-pixel classification methods and flat grouping transformers.
arXiv Detail & Related papers (2023-05-22T13:33:41Z) - CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation.
We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z) - MaskRange: A Mask-classification Model for Range-view based LiDAR
Segmentation [34.04740351544143]
We propose a unified mask-classification model, MaskRange, for the range-view based LiDAR semantic and panoptic segmentation.
Our MaskRange achieves state-of-the-art performance with $66.10$ mIoU on semantic segmentation and promising results with $53.10$ PQ on panoptic segmentation with high efficiency.
arXiv Detail & Related papers (2022-06-24T04:39:49Z) - Per-Pixel Classification is Not All You Need for Semantic Segmentation [184.2905747595058]
Mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks.
We propose MaskFormer, a simple mask classification model which predicts a set of binary masks.
Our method outperforms both current state-of-the-art semantic (55.6 mIoU on ADE20K) and panoptic segmentation (52.7 PQ on COCO) models.
arXiv Detail & Related papers (2021-07-13T17:59:50Z) - Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation.
We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation.
It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z) - Regularized Densely-connected Pyramid Network for Salient Instance
Segmentation [73.17802158095813]
We propose a new pipeline for end-to-end salient instance segmentation (SIS)
To better use the rich feature hierarchies in deep networks, we propose the regularized dense connections.
A novel multi-level RoIAlign based decoder is introduced to adaptively aggregate multi-level features for better mask predictions.
arXiv Detail & Related papers (2020-08-28T00:13:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.