Panoptic SegFormer
- URL: http://arxiv.org/abs/2109.03814v1
- Date: Wed, 8 Sep 2021 17:59:12 GMT
- Title: Panoptic SegFormer
- Authors: Zhiqi Li, Wenhai Wang, Enze Xie, Zhiding Yu, Anima Anandkumar, Jose M.
Alvarez, Tong Lu, Ping Luo
- Abstract summary: We present Panoptic-SegFormer, a framework for end-to-end panoptic segmentation with Transformers.
With a ResNet-50 backbone, our method achieves 50.0% PQ on the COCO test-dev split.
Using a more powerful PVTv2-B5 backbone, Panoptic-SegFormer achieves a new record of 54.1%PQ and 54.4% PQ on the COCO val and test-dev splits with single scale input.
- Score: 82.6258003344804
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Panoptic SegFormer, a general framework for end-to-end panoptic
segmentation with Transformers. The proposed method extends Deformable DETR
with a unified mask prediction workflow for both things and stuff, making the
panoptic segmentation pipeline concise and effective. With a ResNet-50
backbone, our method achieves 50.0\% PQ on the COCO test-dev split, surpassing
previous state-of-the-art methods by significant margins without bells and
whistles. Using a more powerful PVTv2-B5 backbone, Panoptic-SegFormer achieves
a new record of 54.1\%PQ and 54.4\% PQ on the COCO val and test-dev splits with
single scale input.
Related papers
- PAUMER: Patch Pausing Transformer for Semantic Segmentation [3.3148826359547523]
We study the problem of improving the efficiency of segmentation transformers by using disparate amounts of computation for different parts of the image.
Our method, PAUMER, accomplishes this by pausing computation for patches that are deemed to not need any more computation before the final decoder.
arXiv Detail & Related papers (2023-11-01T15:32:11Z) - You Only Segment Once: Towards Real-Time Panoptic Segmentation [68.91492389185744]
YOSO is a real-time panoptic segmentation framework.
YOSO predicts masks via dynamic convolutions between panoptic kernels and image feature maps.
YOSO achieves 46.4 PQ, 45.6 FPS on COCO; 52.5 PQ, 22.6 FPS on Cityscapes; 38.0 PQ, 35.4 FPS on ADE20K.
arXiv Detail & Related papers (2023-03-26T07:55:35Z) - Fully Convolutional Networks for Panoptic Segmentation with Point-based
Supervision [88.71403886207071]
We present a conceptually simple, strong, and efficient framework for fully- and weakly-supervised panoptic segmentation, called Panoptic FCN.
Our approach aims to represent and predict foreground things and background stuff in a unified fully convolutional pipeline.
Panoptic FCN encodes each object instance or stuff category with the proposed kernel generator and produces the prediction by convolving the high-resolution feature directly.
arXiv Detail & Related papers (2021-08-17T15:28:53Z) - A Coarse-to-Fine Instance Segmentation Network with Learning Boundary
Representation [10.967299485260163]
Boundary-based instance segmentation has drawn much attention since of its attractive efficiency.
Existing methods suffer from the difficulty in long-distance regression.
We propose a coarse-to-fine module to address the problem.
arXiv Detail & Related papers (2021-06-18T16:37:28Z) - Fully Convolutional Networks for Panoptic Segmentation [91.84686839549488]
We present a conceptually simple, strong, and efficient framework for panoptic segmentation, called Panoptic FCN.
Our approach aims to represent and predict foreground things and background stuff in a unified fully convolutional pipeline.
Panoptic FCN encodes each object instance or stuff category into a specific kernel weight with the proposed kernel generator.
arXiv Detail & Related papers (2020-12-01T18:31:41Z) - The Devil is in the Boundary: Exploiting Boundary Representation for
Basis-based Instance Segmentation [85.153426159438]
We propose Basis based Instance(B2Inst) to learn a global boundary representation that can complement existing global-mask-based methods.
Our B2Inst leads to consistent improvements and accurately parses out the instance boundaries in a scene.
arXiv Detail & Related papers (2020-11-26T11:26:06Z) - Unifying Training and Inference for Panoptic Segmentation [111.44758195510838]
We present an end-to-end network to bridge the gap between training and inference for panoptic segmentation.
Our system sets new records on the popular street scene dataset, Cityscapes, achieving 61.4 PQ with a ResNet-50 backbone.
Our network flexibly works with and without object mask cues, performing competitively under both settings.
arXiv Detail & Related papers (2020-01-14T18:58:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.