2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation
- URL: http://arxiv.org/abs/2406.00500v1
- Date: Sat, 1 Jun 2024 17:03:16 GMT
- Title: 2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation
- Authors: Biao Wu, Diankai Zhang, Si Gao, Chengjian Zheng, Shaoli Liu, Ning Wang,
- Abstract summary: Video Panoptic (VPS) aims to simultaneously classify, track, segment all objects in a video.
We propose a robust integrated video panoptic segmentation solution.
Our method achieves state-of-the-art performance with a VPQ score of 56.36 and 57.12 in the development and test phases.
- Score: 12.274092278786966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video Panoptic Segmentation (VPS) is a challenging task that is extends from image panoptic segmentation.VPS aims to simultaneously classify, track, segment all objects in a video, including both things and stuff. Due to its wide application in many downstream tasks such as video understanding, video editing, and autonomous driving. In order to deal with the task of video panoptic segmentation in the wild, we propose a robust integrated video panoptic segmentation solution. We use DVIS++ framework as our baseline to generate the initial masks. Then,we add an additional image semantic segmentation model to further improve the performance of semantic classes.Finally, our method achieves state-of-the-art performance with a VPQ score of 56.36 and 57.12 in the development and test phases, respectively, and ultimately ranked 2nd in the VPS track of the PVUW Challenge at CVPR2024.
Related papers
- VISA: Reasoning Video Object Segmentation via Large Language Models [64.33167989521357]
We introduce a new task, Reasoning Video Object (ReasonVOS)
This task aims to generate a sequence of segmentation masks in response to implicit text queries that require complex reasoning abilities.
We introduce VISA (Video-based large language Instructed Assistant) to tackle ReasonVOS.
arXiv Detail & Related papers (2024-07-16T02:29:29Z) - 1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation [81.50620771207329]
We investigate the effectiveness of static-dominant data and frame sampling on referring video object segmentation (RVOS)
Our solution achieves a J&F score of 0.5447 in the competition phase and ranks 1st in the MeViS track of the PVUW Challenge.
arXiv Detail & Related papers (2024-06-11T08:05:26Z) - Training-Free Robust Interactive Video Object Segmentation [82.05906654403684]
We propose a training-free prompt tracking framework for interactive video object segmentation (I-PT)
We jointly adopt sparse points and boxes tracking, filtering out unstable points and capturing object-wise information.
Our framework has demonstrated robust zero-shot video segmentation results on popular VOS datasets.
arXiv Detail & Related papers (2024-06-08T14:25:57Z) - 1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation [11.331198234997714]
Third Pixel-level Video Understanding in the Wild (PVUW CVPR 2024) challenge aims to advance the state of art in video understanding.
This paper details our research work that achieved the 1st place winner in the PVUW'24 VPS challenge.
Our solution stands on the shoulders of giant vision transformer model (DINOv2 ViT-g) and proven multi-stage Decoupled Video Instance frameworks.
arXiv Detail & Related papers (2024-06-08T04:43:08Z) - 3rd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation [19.071113992267826]
We introduce a comprehensive approach centered on the query-wise ensemble, supplemented by additional techniques.
Our proposed approach achieved a VPQ score of 57.01 on the VIPSeg test set, and ranked 3rd in the VPS track of the 3rd Pixel-level Video Understanding in the Wild Challenge.
arXiv Detail & Related papers (2024-06-06T12:22:56Z) - 3rd Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation [10.04177400017471]
We propose a robust integrated video panoptic segmentation solution.
In our solution, we represent both semantic and instance targets as a set of queries.
We then combine these queries with video features extracted by neural networks to predict segmentation masks.
arXiv Detail & Related papers (2023-06-11T19:44:40Z) - 1st Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation [25.235404527487784]
Video panoptic segmentation is a challenging task that serves as the cornerstone of numerous downstream applications.
We believe that the decoupling strategy proposed by DVIS enables more effective utilization of temporal information for both "thing" and "stuff" objects.
Our method achieved a VPQ score of 51.4 and 53.7 in the development and test phases, respectively, and ranked 1st in the VPS track of the 2nd PVUW Challenge.
arXiv Detail & Related papers (2023-06-07T01:24:48Z) - 3rd Place Solution for PVUW2023 VSS Track: A Large Model for Semantic
Segmentation on VSPW [68.56017675820897]
In this paper, we introduce 3rd place solution for PVUW2023 VSS track.
We have explored various image-level visual backbones and segmentation heads to tackle the problem of video semantic segmentation.
arXiv Detail & Related papers (2023-06-04T07:50:38Z) - Mask2Former for Video Instance Segmentation [172.10001340104515]
Mask2Former achieves state-of-the-art performance on video segmentation instance without modifying the architecture, the loss or even the training pipeline.
We show universal image segmentation architectures trivially generalize to video segmentation by directly predicting 3D segmentation volumes.
arXiv Detail & Related papers (2021-12-20T18:59:59Z) - Video Panoptic Segmentation [117.08520543864054]
We propose and explore a new video extension of this task, called video panoptic segmentation.
To invigorate research on this new task, we present two types of video panoptic datasets.
We propose a novel video panoptic segmentation network (VPSNet) which jointly predicts object classes, bounding boxes, masks, instance id tracking, and semantic segmentation in video frames.
arXiv Detail & Related papers (2020-06-19T19:35:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.