3rd Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation
- URL: http://arxiv.org/abs/2306.06753v1
- Date: Sun, 11 Jun 2023 19:44:40 GMT
- Title: 3rd Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation
- Authors: Jinming Su, Wangwang Yang, Junfeng Luo and Xiaolin Wei
- Abstract summary: We propose a robust integrated video panoptic segmentation solution.
In our solution, we represent both semantic and instance targets as a set of queries.
We then combine these queries with video features extracted by neural networks to predict segmentation masks.
- Score: 10.04177400017471
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In order to deal with the task of video panoptic segmentation in the wild, we
propose a robust integrated video panoptic segmentation solution. In our
solution, we regard the video panoptic segmentation task as a segmentation
target querying task, represent both semantic and instance targets as a set of
queries, and then combine these queries with video features extracted by neural
networks to predict segmentation masks. In order to improve the learning
accuracy and convergence speed of the solution, we add additional tasks of
video semantic segmentation and video instance segmentation for joint training.
In addition, we also add an additional image semantic segmentation model to
further improve the performance of semantic classes. In addition, we also add
some additional operations to improve the robustness of the model. Extensive
experiments on the VIPSeg dataset show that the proposed solution achieves
state-of-the-art performance with 50.04\% VPQ on the VIPSeg test set, which is
3rd place on the video panoptic segmentation track of the PVUW Challenge 2023.
Related papers
- Rethinking Video Segmentation with Masked Video Consistency: Did the Model Learn as Intended? [22.191260650245443]
Video segmentation aims at partitioning video sequences into meaningful segments based on objects or regions of interest within frames.
Current video segmentation models are often derived from image segmentation techniques, which struggle to cope with small-scale or class-imbalanced video datasets.
We propose a training strategy Masked Video Consistency, which enhances spatial and temporal feature aggregation.
arXiv Detail & Related papers (2024-08-20T08:08:32Z) - 3rd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation [19.071113992267826]
We introduce a comprehensive approach centered on the query-wise ensemble, supplemented by additional techniques.
Our proposed approach achieved a VPQ score of 57.01 on the VIPSeg test set, and ranked 3rd in the VPS track of the 3rd Pixel-level Video Understanding in the Wild Challenge.
arXiv Detail & Related papers (2024-06-06T12:22:56Z) - 3rd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation [63.199793919573295]
Video Object (VOS) is a vital task in computer vision, focusing on distinguishing foreground objects from the background across video frames.
Our work draws inspiration from the Cutie model, and we investigate the effects of object memory, the total number of memory frames, and input resolution on segmentation performance.
arXiv Detail & Related papers (2024-06-06T00:56:25Z) - 2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation [12.274092278786966]
Video Panoptic (VPS) aims to simultaneously classify, track, segment all objects in a video.
We propose a robust integrated video panoptic segmentation solution.
Our method achieves state-of-the-art performance with a VPQ score of 56.36 and 57.12 in the development and test phases.
arXiv Detail & Related papers (2024-06-01T17:03:16Z) - What is Point Supervision Worth in Video Instance Segmentation? [119.71921319637748]
Video instance segmentation (VIS) is a challenging vision task that aims to detect, segment, and track objects in videos.
We reduce the human annotations to only one point for each object in a video frame during training, and obtain high-quality mask predictions close to fully supervised models.
Comprehensive experiments on three VIS benchmarks demonstrate competitive performance of the proposed framework, nearly matching fully supervised methods.
arXiv Detail & Related papers (2024-04-01T17:38:25Z) - Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain
Adaptation [74.51546366251753]
Video topic segmentation unveils the coarse-grained semantic structure underlying videos.
We introduce a multi-modal video topic segmenter that utilizes both video transcripts and frames.
Our proposed solution significantly surpasses baseline methods in terms of both accuracy and transferability.
arXiv Detail & Related papers (2023-11-30T21:59:05Z) - 3rd Place Solution for PVUW2023 VSS Track: A Large Model for Semantic
Segmentation on VSPW [68.56017675820897]
In this paper, we introduce 3rd place solution for PVUW2023 VSS track.
We have explored various image-level visual backbones and segmentation heads to tackle the problem of video semantic segmentation.
arXiv Detail & Related papers (2023-06-04T07:50:38Z) - An End-to-End Trainable Video Panoptic Segmentation Method
usingTransformers [0.11714813224840924]
We present an algorithm to tackle a video panoptic segmentation problem, a newly emerging area of research.
Our proposed video panoptic segmentation algorithm uses the transformer and it can be trained in end-to-end with an input of multiple video frames.
The method archived 57.81% on the KITTI-STEP dataset and 31.8% on the MOTChallenge-STEP dataset.
arXiv Detail & Related papers (2021-10-08T10:13:37Z) - Merging Tasks for Video Panoptic Segmentation [0.0]
Video panoptic segmentation (VPS) is a recently introduced computer vision task that requires classifying and tracking every pixel in a given video.
To understand video panoptic segmentation, first, earlier introduced constituent tasks that focus on semantics and tracking separately will be researched.
Two data-driven approaches which do not require training on a tailored dataset will be selected to solve it.
arXiv Detail & Related papers (2021-07-10T08:46:42Z) - Coherent Loss: A Generic Framework for Stable Video Segmentation [103.78087255807482]
We investigate how a jittering artifact degrades the visual quality of video segmentation results.
We propose a Coherent Loss with a generic framework to enhance the performance of a neural network against jittering artifacts.
arXiv Detail & Related papers (2020-10-25T10:48:28Z) - Video Panoptic Segmentation [117.08520543864054]
We propose and explore a new video extension of this task, called video panoptic segmentation.
To invigorate research on this new task, we present two types of video panoptic datasets.
We propose a novel video panoptic segmentation network (VPSNet) which jointly predicts object classes, bounding boxes, masks, instance id tracking, and semantic segmentation in video frames.
arXiv Detail & Related papers (2020-06-19T19:35:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.