Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations
- URL: http://arxiv.org/abs/2406.10114v1
- Date: Fri, 14 Jun 2024 15:20:46 GMT
- Title: Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations
- Authors: Daan de Geus, Gijs Dubbelman,
- Abstract summary: Part-aware panoptic segmentation (PPS) requires (a) that each foreground object and background region in an image is segmented and classified, and (b) that all parts within foreground objects are segmented, classified and linked to their parent object.
Existing methods approach PPS by separately conducting object-level and part-level segmentation.
We propose Task-aware Part-Aligned Panoptic (TAPPS)
TAPPS learns to predict part-level segments that are linked to individual parent objects, aligning the learning objective with the task objective, and allowing TAPPS to leverage joint object-part representations.
- Score: 2.087148326341881
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Part-aware panoptic segmentation (PPS) requires (a) that each foreground object and background region in an image is segmented and classified, and (b) that all parts within foreground objects are segmented, classified and linked to their parent object. Existing methods approach PPS by separately conducting object-level and part-level segmentation. However, their part-level predictions are not linked to individual parent objects. Therefore, their learning objective is not aligned with the PPS task objective, which harms the PPS performance. To solve this, and make more accurate PPS predictions, we propose Task-Aligned Part-aware Panoptic Segmentation (TAPPS). This method uses a set of shared queries to jointly predict (a) object-level segments, and (b) the part-level segments within those same objects. As a result, TAPPS learns to predict part-level segments that are linked to individual parent objects, aligning the learning objective with the task objective, and allowing TAPPS to leverage joint object-part representations. With experiments, we show that TAPPS considerably outperforms methods that predict objects and parts separately, and achieves new state-of-the-art PPS results.
Related papers
- 1st Place Solution for MOSE Track in CVPR 2024 PVUW Workshop: Complex Video Object Segmentation [72.54357831350762]
We propose a semantic embedding video object segmentation model and use the salient features of objects as query representations.
We trained our model on a large-scale video object segmentation dataset.
Our model achieves first place (textbf84.45%) in the test set of Complex Video Object Challenge.
arXiv Detail & Related papers (2024-06-07T03:13:46Z) - JPPF: Multi-task Fusion for Consistent Panoptic-Part Segmentation [12.19926973291957]
Part-aware panoptic segmentation is a problem of computer vision that aims to provide a semantic understanding of the scene at multiple levels of granularity.
We present our Joint Panoptic Part Fusion (JPPF) that combines the three individual segmentations effectively to obtain a panoptic-part segmentation.
arXiv Detail & Related papers (2023-11-30T15:17:46Z) - PartSeg: Few-shot Part Segmentation via Part-aware Prompt Learning [44.48704588318053]
We develop a novel method termed PartSeg for few-shot part segmentation based on multimodal learning.
We conduct extensive experiments on the PartImageNet and Pascal$_$Part datasets.
arXiv Detail & Related papers (2023-08-24T13:03:42Z) - PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation [153.76253697804225]
Panoptic Part (PPS) unifies panoptic and part segmentation into one task.
We design the first end-to-end unified framework, Panoptic-PartFormer.
Our models can serve as a strong baseline and aid future research in PPS.
arXiv Detail & Related papers (2023-01-03T05:30:56Z) - Panoptic-PartFormer: Learning a Unified Model for Panoptic Part
Segmentation [76.9420522112248]
Panoptic Part (PPS) aims to unify panoptic segmentation and part segmentation into one task.
We design the first end-to-end unified method named Panoptic-PartFormer.
Our Panoptic-PartFormer achieves the new state-of-the-art results on both Cityscapes PPS and Pascal Context PPS datasets.
arXiv Detail & Related papers (2022-04-10T11:16:45Z) - Part-level Action Parsing via a Pose-guided Coarse-to-Fine Framework [108.70949305791201]
Part-level Action Parsing (PAP) aims to not only predict the video-level action but also recognize the frame-level fine-grained actions or interactions of body parts for each person in the video.
In particular, our framework first predicts the video-level class of the input video, then localizes the body parts and predicts the part-level action.
Our framework achieves state-of-the-art performance and outperforms existing methods over a 31.10% ROC score.
arXiv Detail & Related papers (2022-03-09T01:30:57Z) - 3D Compositional Zero-shot Learning with DeCompositional Consensus [102.7571947144639]
We argue that part knowledge should be composable beyond the observed object classes.
We present 3D Compositional Zero-shot Learning as a problem of part generalization from seen to unseen object classes.
arXiv Detail & Related papers (2021-11-29T16:34:53Z) - Part-aware Panoptic Segmentation [3.342126234995932]
Part-aware Panoptic (PPS) aims to understand a scene at multiple levels of abstraction.
We provide consistent annotations on two commonly used datasets: Cityscapes and Pascal VOC.
We present a single metric to evaluate PPS, called Part-aware Panoptic Quality (PartPQ)
arXiv Detail & Related papers (2021-06-11T12:48:07Z) - Personal Fixations-Based Object Segmentation with Object Localization
and Boundary Preservation [60.41628937597989]
We focus on Personal Fixations-based Object (PFOS) to address issues in previous studies.
We propose a novel network based on Object Localization and Boundary Preservation (OLBP) to segment the gazed objects.
OLBP is organized in the mixed bottom-up and top-down manner with multiple types of deep supervision.
arXiv Detail & Related papers (2021-01-22T09:20:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.