Video Panoptic Segmentation
- URL: http://arxiv.org/abs/2006.11339v1
- Date: Fri, 19 Jun 2020 19:35:47 GMT
- Title: Video Panoptic Segmentation
- Authors: Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon
- Abstract summary: We propose and explore a new video extension of this task, called video panoptic segmentation.
To invigorate research on this new task, we present two types of video panoptic datasets.
We propose a novel video panoptic segmentation network (VPSNet) which jointly predicts object classes, bounding boxes, masks, instance id tracking, and semantic segmentation in video frames.
- Score: 117.08520543864054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Panoptic segmentation has become a new standard of visual recognition task by
unifying previous semantic segmentation and instance segmentation tasks in
concert. In this paper, we propose and explore a new video extension of this
task, called video panoptic segmentation. The task requires generating
consistent panoptic segmentation as well as an association of instance ids
across video frames. To invigorate research on this new task, we present two
types of video panoptic datasets. The first is a re-organization of the
synthetic VIPER dataset into the video panoptic format to exploit its
large-scale pixel annotations. The second is a temporal extension on the
Cityscapes val. set, by providing new video panoptic annotations
(Cityscapes-VPS). Moreover, we propose a novel video panoptic segmentation
network (VPSNet) which jointly predicts object classes, bounding boxes, masks,
instance id tracking, and semantic segmentation in video frames. To provide
appropriate metrics for this task, we propose a video panoptic quality (VPQ)
metric and evaluate our method and several other baselines. Experimental
results demonstrate the effectiveness of the presented two datasets. We achieve
state-of-the-art results in image PQ on Cityscapes and also in VPQ on
Cityscapes-VPS and VIPER datasets. The datasets and code are made publicly
available.
Related papers
- 2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation [12.274092278786966]
Video Panoptic (VPS) aims to simultaneously classify, track, segment all objects in a video.
We propose a robust integrated video panoptic segmentation solution.
Our method achieves state-of-the-art performance with a VPQ score of 56.36 and 57.12 in the development and test phases.
arXiv Detail & Related papers (2024-06-01T17:03:16Z) - DVIS++: Improved Decoupled Framework for Universal Video Segmentation [30.703276476607545]
We present OV-DVIS++, the first open-vocabulary universal video segmentation framework.
By integrating CLIP with DVIS++, we present OV-DVIS++, the first open-vocabulary universal video segmentation framework.
arXiv Detail & Related papers (2023-12-20T03:01:33Z) - Towards Open-Vocabulary Video Instance Segmentation [61.469232166803465]
Video Instance aims at segmenting and categorizing objects in videos from a closed set of training categories.
We introduce the novel task of Open-Vocabulary Video Instance, which aims to simultaneously segment, track, and classify objects in videos from open-set categories.
To benchmark Open-Vocabulary VIS, we collect a Large-Vocabulary Video Instance dataset (LV-VIS), that contains well-annotated objects from 1,196 diverse categories.
arXiv Detail & Related papers (2023-04-04T11:25:23Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - Slot-VPS: Object-centric Representation Learning for Video Panoptic
Segmentation [29.454785969084384]
Video Panoptic (VPS) aims at assigning a class label to each pixel, uniquely segmenting and identifying all object instances consistently across all frames.
We present Slot-VPS, the first end-to-end framework for this task.
We encode all panoptic entities in a video, including instances and background semantics, with a unified representation called panoptic slots.
The coherent-temporal object's information is retrieved and encoded into the panoptic slots by proposed the Video Panoptic Retriever, enabling it to localize, segment, differentiate, and associate objects in a unified manner.
arXiv Detail & Related papers (2021-12-16T15:12:22Z) - An End-to-End Trainable Video Panoptic Segmentation Method
usingTransformers [0.11714813224840924]
We present an algorithm to tackle a video panoptic segmentation problem, a newly emerging area of research.
Our proposed video panoptic segmentation algorithm uses the transformer and it can be trained in end-to-end with an input of multiple video frames.
The method archived 57.81% on the KITTI-STEP dataset and 31.8% on the MOTChallenge-STEP dataset.
arXiv Detail & Related papers (2021-10-08T10:13:37Z) - Merging Tasks for Video Panoptic Segmentation [0.0]
Video panoptic segmentation (VPS) is a recently introduced computer vision task that requires classifying and tracking every pixel in a given video.
To understand video panoptic segmentation, first, earlier introduced constituent tasks that focus on semantics and tracking separately will be researched.
Two data-driven approaches which do not require training on a tailored dataset will be selected to solve it.
arXiv Detail & Related papers (2021-07-10T08:46:42Z) - A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications.
Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z) - Learning to Associate Every Segment for Video Panoptic Segmentation [123.03617367709303]
We learn coarse segment-level matching and fine pixel-level matching together.
We show that our per-frame computation model can achieve new state-of-the-art results on Cityscapes-VPS and VIPER datasets.
arXiv Detail & Related papers (2021-06-17T13:06:24Z) - STEP: Segmenting and Tracking Every Pixel [107.23184053133636]
We present a new benchmark: Segmenting and Tracking Every Pixel (STEP)
Our work is the first that targets this task in a real-world setting that requires dense interpretation in both spatial and temporal domains.
For measuring the performance, we propose a novel evaluation metric and Tracking Quality (STQ)
arXiv Detail & Related papers (2021-02-23T18:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.