Merging Tasks for Video Panoptic Segmentation
- URL: http://arxiv.org/abs/2108.04223v1
- Date: Sat, 10 Jul 2021 08:46:42 GMT
- Title: Merging Tasks for Video Panoptic Segmentation
- Authors: Jake Rap, Panagiotis Meletis
- Abstract summary: Video panoptic segmentation (VPS) is a recently introduced computer vision task that requires classifying and tracking every pixel in a given video.
To understand video panoptic segmentation, first, earlier introduced constituent tasks that focus on semantics and tracking separately will be researched.
Two data-driven approaches which do not require training on a tailored dataset will be selected to solve it.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper, the task of video panoptic segmentation is studied and two
different methods to solve the task will be proposed. Video panoptic
segmentation (VPS) is a recently introduced computer vision task that requires
classifying and tracking every pixel in a given video. The nature of this task
makes the cost of annotating datasets for it prohibiting. To understand video
panoptic segmentation, first, earlier introduced constituent tasks that focus
on semantics and tracking separately will be researched. Thereafter, two
data-driven approaches which do not require training on a tailored VPS dataset
will be selected to solve it. The first approach will show how a model for
video panoptic segmentation can be built by heuristically fusing the outputs of
a pre-trained semantic segmentation model and a pre-trained multi-object
tracking model. This can be desired if one wants to easily extend the
capabilities of either model. The second approach will counter some of the
shortcomings of the first approach by building on top of a shared neural
network backbone with task-specific heads. This network is designed for
panoptic segmentation and will be extended by a mask propagation module to link
instance masks across time, yielding the video panoptic segmentation format.
Related papers
- Tracking Anything with Decoupled Video Segmentation [87.07258378407289]
We develop a decoupled video segmentation approach (DEVA)
It is composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation.
We show that this decoupled formulation compares favorably to end-to-end approaches in several data-scarce tasks.
arXiv Detail & Related papers (2023-09-07T17:59:41Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - An End-to-End Trainable Video Panoptic Segmentation Method
usingTransformers [0.11714813224840924]
We present an algorithm to tackle a video panoptic segmentation problem, a newly emerging area of research.
Our proposed video panoptic segmentation algorithm uses the transformer and it can be trained in end-to-end with an input of multiple video frames.
The method archived 57.81% on the KITTI-STEP dataset and 31.8% on the MOTChallenge-STEP dataset.
arXiv Detail & Related papers (2021-10-08T10:13:37Z) - Prototypical Cross-Attention Networks for Multiple Object Tracking and
Segmentation [95.74244714914052]
Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes.
We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich-temporal information online.
PCAN outperforms current video instance tracking and segmentation competition winners on Youtube-VIS and BDD100K datasets.
arXiv Detail & Related papers (2021-06-22T17:57:24Z) - Learning to Associate Every Segment for Video Panoptic Segmentation [123.03617367709303]
We learn coarse segment-level matching and fine pixel-level matching together.
We show that our per-frame computation model can achieve new state-of-the-art results on Cityscapes-VPS and VIPER datasets.
arXiv Detail & Related papers (2021-06-17T13:06:24Z) - Video Panoptic Segmentation [117.08520543864054]
We propose and explore a new video extension of this task, called video panoptic segmentation.
To invigorate research on this new task, we present two types of video panoptic datasets.
We propose a novel video panoptic segmentation network (VPSNet) which jointly predicts object classes, bounding boxes, masks, instance id tracking, and semantic segmentation in video frames.
arXiv Detail & Related papers (2020-06-19T19:35:47Z) - CRNet: Cross-Reference Networks for Few-Shot Segmentation [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images.
With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images.
Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-03-24T04:55:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.