Hybrid Tracker with Pixel and Instance for Video Panoptic Segmentation
- URL: http://arxiv.org/abs/2203.01217v2
- Date: Mon, 11 Dec 2023 08:28:02 GMT
- Title: Hybrid Tracker with Pixel and Instance for Video Panoptic Segmentation
- Authors: Weicai Ye, Xinyue Lan, Ge Su, Hujun Bao, Zhaopeng Cui, Guofeng Zhang
- Abstract summary: Video Panoptic coefficient (VPS) aims to generate coherent panoptic segmentation and track the identities of all pixels across video frames.
We present HybridTracker, a lightweight and joint tracking model attempting to eliminate the limitations of the single tracker.
Comprehensive experiments show that HybridTracker achieves superior performance than state-of-the-art methods on Cityscapes-VPS and VIPER datasets.
- Score: 50.62685357414904
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Video Panoptic Segmentation (VPS) aims to generate coherent panoptic
segmentation and track the identities of all pixels across video frames.
Existing methods predominantly utilize the trained instance embedding to keep
the consistency of panoptic segmentation. However, they inevitably struggle to
cope with the challenges of small objects, similar appearance but inconsistent
identities, occlusion, and strong instance contour deformations. To address
these problems, we present HybridTracker, a lightweight and joint tracking
model attempting to eliminate the limitations of the single tracker.
HybridTracker performs pixel tracker and instance tracker in parallel to obtain
the association matrices, which are fused into a matching matrix. In the
instance tracker, we design a differentiable matching layer, ensuring the
stability of inter-frame matching. In the pixel tracker, we compute the dice
coefficient of the same instance of different frames given the estimated
optical flow, forming the Intersection Over Union (IoU) matrix. We additionally
propose mutual check and temporal consistency constraints during inference to
settle the occlusion and contour deformation challenges. Comprehensive
experiments show that HybridTracker achieves superior performance than
state-of-the-art methods on Cityscapes-VPS and VIPER datasets.
Related papers
- Tracking by Associating Clips [110.08925274049409]
In this paper, we investigate an alternative by treating object association as clip-wise matching.
Our new perspective views a single long video sequence as multiple short clips, and then the tracking is performed both within and between the clips.
The benefits of this new approach are two folds. First, our method is robust to tracking error accumulation or propagation, as the video chunking allows bypassing the interrupted frames.
Second, the multiple frame information is aggregated during the clip-wise matching, resulting in a more accurate long-range track association than the current frame-wise matching.
arXiv Detail & Related papers (2022-12-20T10:33:17Z) - Video Shadow Detection via Spatio-Temporal Interpolation Consistency
Training [31.115226660100294]
We propose a framework to feed the unlabeled video frames together with the labeled images into an image shadow detection network training.
We then derive the spatial and temporal consistency constraints accordingly for enhancing generalization in the pixel-wise classification.
In addition, we design a Scale-Aware Network for multi-scale shadow knowledge learning in images.
arXiv Detail & Related papers (2022-06-17T14:29:51Z) - Video Instance Segmentation by Instance Flow Assembly [23.001856276175506]
Bottom-up methods dealing with box-free features could offer accurate spacial correlations across frames.
We propose our framework equipped with a temporal context fusion module to better encode inter-frame correlations.
Experiments demonstrate that the proposed method outperforms the state-of-the-art online methods (taking image-level input) on the challenging Youtube-VIS dataset.
arXiv Detail & Related papers (2021-10-20T14:49:28Z) - Polygonal Point Set Tracking [50.445151155209246]
We propose a novel learning-based polygonal point set tracking method.
Our goal is to track corresponding points on the target contour.
We present visual-effects applications of our method on part distortion and text mapping.
arXiv Detail & Related papers (2021-05-30T17:12:36Z) - Spatial Feature Calibration and Temporal Fusion for Effective One-stage
Video Instance Segmentation [16.692219644392253]
We propose a one-stage video instance segmentation framework by spatial calibration and temporal fusion, namely STMask.
Experiments on the YouTube-VIS valid set show that the proposed STMask with ResNet-50/-101 backbone obtains 33.5 % / 36.8 % mask AP, while achieving 28.6 / 23.4 FPS on video instance segmentation.
arXiv Detail & Related papers (2021-04-06T09:26:58Z) - Learning to Track Instances without Video Annotations [85.9865889886669]
We introduce a novel semi-supervised framework by learning instance tracking networks with only a labeled image dataset and unlabeled video sequences.
We show that even when only trained with images, the learned feature representation is robust to instance appearance variations.
In addition, we integrate this module into single-stage instance segmentation and pose estimation frameworks.
arXiv Detail & Related papers (2021-04-01T06:47:41Z) - CompFeat: Comprehensive Feature Aggregation for Video Instance
Segmentation [67.17625278621134]
Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video.
Previous approaches only utilize single-frame features for the detection, segmentation, and tracking of objects.
We propose a novel comprehensive feature aggregation approach (CompFeat) to refine features at both frame-level and object-level with temporal and spatial context information.
arXiv Detail & Related papers (2020-12-07T00:31:42Z) - Unsupervised Spatio-temporal Latent Feature Clustering for
Multiple-object Tracking and Segmentation [0.5591659577198183]
We propose a strategy that treats the temporal identification task as a heterogeneous-temporal clustering problem.
We use a convolutional and fully connected autoencoder to learn discriminative features from segmentation masks and detection bounding boxes.
Our results show that our technique outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2020-07-14T16:47:56Z) - Tracking Road Users using Constraint Programming [79.32806233778511]
We present a constraint programming (CP) approach for the data association phase found in the tracking-by-detection paradigm of the multiple object tracking (MOT) problem.
Our proposed method was tested on a motorized vehicles tracking dataset and produces results that outperform the top methods of the UA-DETRAC benchmark.
arXiv Detail & Related papers (2020-03-10T00:04:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.