Panoptic-PartFormer: Learning a Unified Model for Panoptic Part
Segmentation
- URL: http://arxiv.org/abs/2204.04655v1
- Date: Sun, 10 Apr 2022 11:16:45 GMT
- Title: Panoptic-PartFormer: Learning a Unified Model for Panoptic Part
Segmentation
- Authors: Xiangtai Li, Shilin Xu, Yibo Yang.Guangliang Cheng, Yunhai Tong,
Dacheng Tao
- Abstract summary: Panoptic Part (PPS) aims to unify panoptic segmentation and part segmentation into one task.
We design the first end-to-end unified method named Panoptic-PartFormer.
Our Panoptic-PartFormer achieves the new state-of-the-art results on both Cityscapes PPS and Pascal Context PPS datasets.
- Score: 76.9420522112248
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Panoptic Part Segmentation (PPS) aims to unify panoptic segmentation and part
segmentation into one task. Previous work mainly utilizes separated approaches
to handle thing, stuff, and part predictions individually without performing
any shared computation and task association. In this work, we aim to unify
these tasks at the architectural level, designing the first end-to-end unified
method named Panoptic-PartFormer. In particular, motivated by the recent
progress in Vision Transformer, we model things, stuff, and part as object
queries and directly learn to optimize the all three predictions as unified
mask prediction and classification problem. We design a decoupled decoder to
generate part feature and thing/stuff feature respectively. Then we propose to
utilize all the queries and corresponding features to perform reasoning jointly
and iteratively. The final mask can be obtained via inner product between
queries and the corresponding features. The extensive ablation studies and
analysis prove the effectiveness of our framework. Our Panoptic-PartFormer
achieves the new state-of-the-art results on both Cityscapes PPS and Pascal
Context PPS datasets with at least 70% GFlops and 50% parameters decrease. In
particular, we get 3.4% relative improvements with ResNet50 backbone and 10%
improvements after adopting Swin Transformer on Pascal Context PPS dataset. To
the best of our knowledge, we are the first to solve the PPS problem via
\textit{a unified and end-to-end transformer model. Given its effectiveness and
conceptual simplicity, we hope our Panoptic-PartFormer can serve as a good
baseline and aid future unified research for PPS. Our code and models will be
available at https://github.com/lxtGH/Panoptic-PartFormer.
Related papers
- Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations [2.087148326341881]
Part-aware panoptic segmentation (PPS) requires (a) that each foreground object and background region in an image is segmented and classified, and (b) that all parts within foreground objects are segmented, classified and linked to their parent object.
Existing methods approach PPS by separately conducting object-level and part-level segmentation.
We propose Task-aware Part-Aligned Panoptic (TAPPS)
TAPPS learns to predict part-level segments that are linked to individual parent objects, aligning the learning objective with the task objective, and allowing TAPPS to leverage joint object-part representations.
arXiv Detail & Related papers (2024-06-14T15:20:46Z) - JPPF: Multi-task Fusion for Consistent Panoptic-Part Segmentation [12.19926973291957]
Part-aware panoptic segmentation is a problem of computer vision that aims to provide a semantic understanding of the scene at multiple levels of granularity.
We present our Joint Panoptic Part Fusion (JPPF) that combines the three individual segmentations effectively to obtain a panoptic-part segmentation.
arXiv Detail & Related papers (2023-11-30T15:17:46Z) - You Only Segment Once: Towards Real-Time Panoptic Segmentation [68.91492389185744]
YOSO is a real-time panoptic segmentation framework.
YOSO predicts masks via dynamic convolutions between panoptic kernels and image feature maps.
YOSO achieves 46.4 PQ, 45.6 FPS on COCO; 52.5 PQ, 22.6 FPS on Cityscapes; 38.0 PQ, 35.4 FPS on ADE20K.
arXiv Detail & Related papers (2023-03-26T07:55:35Z) - Position-Guided Point Cloud Panoptic Segmentation Transformer [118.17651196656178]
This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline.
We observe that instances in the sparse point clouds are relatively small to the whole scene and often have similar geometry but lack distinctive appearance for segmentation, which are rare in the image domain.
The method, named Position-guided Point cloud Panoptic segmentation transFormer (P3Former), outperforms previous state-of-the-art methods by 3.4% and 1.2% on Semantic KITTI and nuScenes benchmark, respectively.
arXiv Detail & Related papers (2023-03-23T17:59:02Z) - PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation [153.76253697804225]
Panoptic Part (PPS) unifies panoptic and part segmentation into one task.
We design the first end-to-end unified framework, Panoptic-PartFormer.
Our models can serve as a strong baseline and aid future research in PPS.
arXiv Detail & Related papers (2023-01-03T05:30:56Z) - Multi-task Fusion for Efficient Panoptic-Part Segmentation [12.650574326251023]
We introduce a novel network that generates semantic, instance, and part segmentation using a shared encoder.
To fuse the predictions of all three heads efficiently, we introduce a parameter-free joint fusion module.
Our method is evaluated on the Cityscapes Panoptic Parts ( CPP) and Pascal Panoptic Parts (PPP) datasets.
arXiv Detail & Related papers (2022-12-15T09:04:45Z) - Fashionformer: A simple, Effective and Unified Baseline for Human
Fashion Segmentation and Recognition [80.74495836502919]
In this work, we focus on joint human fashion segmentation and attribute recognition.
We introduce the object query for segmentation and the attribute query for attribute prediction.
For attribute stream, we design a novel Multi-Layer Rendering module to explore more fine-grained features.
arXiv Detail & Related papers (2022-04-10T11:11:10Z) - Fully Convolutional Networks for Panoptic Segmentation [91.84686839549488]
We present a conceptually simple, strong, and efficient framework for panoptic segmentation, called Panoptic FCN.
Our approach aims to represent and predict foreground things and background stuff in a unified fully convolutional pipeline.
Panoptic FCN encodes each object instance or stuff category into a specific kernel weight with the proposed kernel generator.
arXiv Detail & Related papers (2020-12-01T18:31:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.