Unified Perception: Efficient Depth-Aware Video Panoptic Segmentation
with Minimal Annotation Costs
- URL: http://arxiv.org/abs/2303.01991v2
- Date: Sun, 2 Apr 2023 17:25:52 GMT
- Title: Unified Perception: Efficient Depth-Aware Video Panoptic Segmentation
with Minimal Annotation Costs
- Authors: Kurt Stolle and Gijs Dubbelman
- Abstract summary: We present a new approach titled Unified Perception that achieves state-of-the-art performance without requiring video-based training.
Our method employs a simple two-stage cascaded tracking algorithm that (re)uses object embeddings computed in an image-based network.
- Score: 2.7920304852537536
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth-aware video panoptic segmentation is a promising approach to camera
based scene understanding. However, the current state-of-the-art methods
require costly video annotations and use a complex training pipeline compared
to their image-based equivalents. In this paper, we present a new approach
titled Unified Perception that achieves state-of-the-art performance without
requiring video-based training. Our method employs a simple two-stage cascaded
tracking algorithm that (re)uses object embeddings computed in an image-based
network. Experimental results on the Cityscapes-DVPS dataset demonstrate that
our method achieves an overall DVPQ of 57.1, surpassing state-of-the-art
methods. Furthermore, we show that our tracking strategies are effective for
long-term object association on KITTI-STEP, achieving an STQ of 59.1 which
exceeded the performance of state-of-the-art methods that employ the same
backbone network.
Code is available at: https://tue-mps.github.io/unipercept
Related papers
- Rethinking Image-to-Video Adaptation: An Object-centric Perspective [61.833533295978484]
We propose a novel and efficient image-to-video adaptation strategy from the object-centric perspective.
Inspired by human perception, we integrate a proxy task of object discovery into image-to-video transfer learning.
arXiv Detail & Related papers (2024-07-09T13:58:10Z) - Time Does Tell: Self-Supervised Time-Tuning of Dense Image
Representations [79.87044240860466]
We propose a novel approach that incorporates temporal consistency in dense self-supervised learning.
Our approach, which we call time-tuning, starts from image-pretrained models and fine-tunes them with a novel self-supervised temporal-alignment clustering loss on unlabeled videos.
Time-tuning improves the state-of-the-art by 8-10% for unsupervised semantic segmentation on videos and matches it for images.
arXiv Detail & Related papers (2023-08-22T21:28:58Z) - Learning Video Salient Object Detection Progressively from Unlabeled
Videos [8.224670666756193]
We propose a novel VSOD method via a progressive framework that locates and segments salient objects in sequence without utilizing any video annotation.
Specifically, an algorithm for generating deeptemporal location labels, which consists of generating high-saliency location labels and tracking salient objects in adjacent frames, is proposed.
Although our method does not require labeled video at all, the experimental results on five public benchmarks of DAVIS, FBMS, ViSal, VOS, and DAVSOD demonstrate that our proposed method is competitive with fully supervised methods and outperforms the state-of-the-art weakly and unsupervised methods.
arXiv Detail & Related papers (2022-04-05T06:12:45Z) - Box Supervised Video Segmentation Proposal Network [3.384080569028146]
We propose a box-supervised video object segmentation proposal network, which takes advantage of intrinsic video properties.
The proposed method outperforms the state-of-the-art self-supervised benchmark by 16.4% and 6.9%.
We provide extensive tests and ablations on the datasets, demonstrating the robustness of our method.
arXiv Detail & Related papers (2022-02-14T20:38:28Z) - Deep Video Prior for Video Consistency and Propagation [58.250209011891904]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly instead of a large dataset.
We show that temporal consistency can be achieved by training a convolutional neural network on a video with Deep Video Prior.
arXiv Detail & Related papers (2022-01-27T16:38:52Z) - Unsupervised Domain Adaptation for Video Semantic Segmentation [91.30558794056054]
Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real.
In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic approaches.
We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
arXiv Detail & Related papers (2021-07-23T07:18:20Z) - Blind Video Temporal Consistency via Deep Video Prior [61.062900556483164]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly.
We show that temporal consistency can be achieved by training a convolutional network on a video with the Deep Video Prior.
arXiv Detail & Related papers (2020-10-22T16:19:20Z) - Unsupervised Learning of Video Representations via Dense Trajectory
Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation.
We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z) - Learning Spatiotemporal Features via Video and Text Pair Discrimination [30.64670449131973]
Cross-modal pair (CPD) framework captures correlation between video and its associated text.
We train our CPD models on both standard video dataset (Kinetics-210k) and uncurated web video dataset (-300k) to demonstrate its effectiveness.
arXiv Detail & Related papers (2020-01-16T08:28:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.