PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation
- URL: http://arxiv.org/abs/2301.00954v4
- Date: Sun, 1 Sep 2024 01:27:58 GMT
- Title: PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation
- Authors: Xiangtai Li, Shilin Xu, Yibo Yang, Haobo Yuan, Guangliang Cheng, Yunhai Tong, Zhouchen Lin, Ming-Hsuan Yang, Dacheng Tao,
- Abstract summary: Panoptic Part (PPS) unifies panoptic and part segmentation into one task.
We design the first end-to-end unified framework, Panoptic-PartFormer.
Our models can serve as a strong baseline and aid future research in PPS.
- Score: 153.76253697804225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Panoptic Part Segmentation (PPS) unifies panoptic and part segmentation into one task. Previous works utilize separate approaches to handle things, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework, Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we first design a meta-architecture that decouples part features and things/stuff features, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Second, we propose a new metric Part-Whole Quality (PWQ), better to measure this task from pixel-region and part-whole perspectives. It also decouples the errors for part segmentation and panoptic segmentation. Third, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross-attention scheme to boost part segmentation qualities further. We design a new part-whole interaction method using masked cross attention. Finally, extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results. Our models can serve as a strong baseline and aid future research in PPS. The source code and trained models will be available at~\url{https://github.com/lxtGH/Panoptic-PartFormer}.
Related papers
- Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations [2.087148326341881]
Part-aware panoptic segmentation (PPS) requires (a) that each foreground object and background region in an image is segmented and classified, and (b) that all parts within foreground objects are segmented, classified and linked to their parent object.
Existing methods approach PPS by separately conducting object-level and part-level segmentation.
We propose Task-aware Part-Aligned Panoptic (TAPPS)
TAPPS learns to predict part-level segments that are linked to individual parent objects, aligning the learning objective with the task objective, and allowing TAPPS to leverage joint object-part representations.
arXiv Detail & Related papers (2024-06-14T15:20:46Z) - JPPF: Multi-task Fusion for Consistent Panoptic-Part Segmentation [12.19926973291957]
Part-aware panoptic segmentation is a problem of computer vision that aims to provide a semantic understanding of the scene at multiple levels of granularity.
We present our Joint Panoptic Part Fusion (JPPF) that combines the three individual segmentations effectively to obtain a panoptic-part segmentation.
arXiv Detail & Related papers (2023-11-30T15:17:46Z) - You Only Segment Once: Towards Real-Time Panoptic Segmentation [68.91492389185744]
YOSO is a real-time panoptic segmentation framework.
YOSO predicts masks via dynamic convolutions between panoptic kernels and image feature maps.
YOSO achieves 46.4 PQ, 45.6 FPS on COCO; 52.5 PQ, 22.6 FPS on Cityscapes; 38.0 PQ, 35.4 FPS on ADE20K.
arXiv Detail & Related papers (2023-03-26T07:55:35Z) - Position-Guided Point Cloud Panoptic Segmentation Transformer [118.17651196656178]
This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline.
We observe that instances in the sparse point clouds are relatively small to the whole scene and often have similar geometry but lack distinctive appearance for segmentation, which are rare in the image domain.
The method, named Position-guided Point cloud Panoptic segmentation transFormer (P3Former), outperforms previous state-of-the-art methods by 3.4% and 1.2% on Semantic KITTI and nuScenes benchmark, respectively.
arXiv Detail & Related papers (2023-03-23T17:59:02Z) - Multi-task Fusion for Efficient Panoptic-Part Segmentation [12.650574326251023]
We introduce a novel network that generates semantic, instance, and part segmentation using a shared encoder.
To fuse the predictions of all three heads efficiently, we introduce a parameter-free joint fusion module.
Our method is evaluated on the Cityscapes Panoptic Parts ( CPP) and Pascal Panoptic Parts (PPP) datasets.
arXiv Detail & Related papers (2022-12-15T09:04:45Z) - Panoptic-PartFormer: Learning a Unified Model for Panoptic Part
Segmentation [76.9420522112248]
Panoptic Part (PPS) aims to unify panoptic segmentation and part segmentation into one task.
We design the first end-to-end unified method named Panoptic-PartFormer.
Our Panoptic-PartFormer achieves the new state-of-the-art results on both Cityscapes PPS and Pascal Context PPS datasets.
arXiv Detail & Related papers (2022-04-10T11:16:45Z) - PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic
Segmentation [90.26723865198348]
We present PolyphonicFormer, a vision transformer to unify all the sub-tasks under the DVPS task.
Our method explores the relationship between depth estimation and panoptic segmentation via query-based learning.
Our method ranks 1st on the ICCV-2021 BMTT Challenge video + depth track.
arXiv Detail & Related papers (2021-12-05T14:31:47Z) - Part-aware Panoptic Segmentation [3.342126234995932]
Part-aware Panoptic (PPS) aims to understand a scene at multiple levels of abstraction.
We provide consistent annotations on two commonly used datasets: Cityscapes and Pascal VOC.
We present a single metric to evaluate PPS, called Part-aware Panoptic Quality (PartPQ)
arXiv Detail & Related papers (2021-06-11T12:48:07Z) - Video Panoptic Segmentation [117.08520543864054]
We propose and explore a new video extension of this task, called video panoptic segmentation.
To invigorate research on this new task, we present two types of video panoptic datasets.
We propose a novel video panoptic segmentation network (VPSNet) which jointly predicts object classes, bounding boxes, masks, instance id tracking, and semantic segmentation in video frames.
arXiv Detail & Related papers (2020-06-19T19:35:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.