Domain Adaptive Video Segmentation via Temporal Pseudo Supervision
- URL: http://arxiv.org/abs/2207.02372v1
- Date: Wed, 6 Jul 2022 00:36:14 GMT
- Title: Domain Adaptive Video Segmentation via Temporal Pseudo Supervision
- Authors: Yun Xing, Dayan Guan, Jiaxing Huang, Shijian Lu
- Abstract summary: Video semantic segmentation can mitigate data labelling constraints by adapting from a labelled source domain toward an unlabelled target domain.
We design temporal pseudo supervision (TPS), a simple and effective method that explores the idea of consistency training for representations effective from target videos.
We show that TPS is simpler to implement, much more stable to train, and achieves superior video accuracy as compared with the state-of-the-art.
- Score: 46.38660541271893
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Video semantic segmentation has achieved great progress under the supervision
of large amounts of labelled training data. However, domain adaptive video
segmentation, which can mitigate data labelling constraints by adapting from a
labelled source domain toward an unlabelled target domain, is largely
neglected. We design temporal pseudo supervision (TPS), a simple and effective
method that explores the idea of consistency training for learning effective
representations from unlabelled target videos. Unlike traditional consistency
training that builds consistency in spatial space, we explore consistency
training in spatiotemporal space by enforcing model consistency across
augmented video frames which helps learn from more diverse target data.
Specifically, we design cross-frame pseudo labelling to provide pseudo
supervision from previous video frames while learning from the augmented
current video frames. The cross-frame pseudo labelling encourages the network
to produce high-certainty predictions, which facilitates consistency training
with cross-frame augmentation effectively. Extensive experiments over multiple
public datasets show that TPS is simpler to implement, much more stable to
train, and achieves superior video segmentation accuracy as compared with the
state-of-the-art.
Related papers
- SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations [12.139451002212063]
SSVOD exploits motion dynamics of videos to utilize large-scale unlabeled frames with sparse annotations.
Our method achieves significant performance improvements over existing methods on ImageNet-VID, Epic-KITCHENS, and YouTube-VIS.
arXiv Detail & Related papers (2023-09-04T06:41:33Z) - Transform-Equivariant Consistency Learning for Temporal Sentence
Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video.
Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted.
In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z) - Video Annotation for Visual Tracking via Selection and Refinement [74.08109740917122]
We present a new framework to facilitate bounding box annotations for video sequences.
A temporal assessment network is proposed which is able to capture the temporal coherence of target locations.
A visual-geometry refinement network is also designed to further enhance the selected tracking results.
arXiv Detail & Related papers (2021-08-09T05:56:47Z) - Unsupervised Domain Adaptation for Video Semantic Segmentation [91.30558794056054]
Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real.
In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic approaches.
We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
arXiv Detail & Related papers (2021-07-23T07:18:20Z) - Domain Adaptive Video Segmentation via Temporal Consistency
Regularization [32.77436219094282]
This paper presents DA-VSN, a domain adaptive video segmentation network that addresses domain gaps in videos by temporal consistency regularization (TCR)
The first is cross-domain TCR that guides the prediction of target frames to have similar temporal consistency as that of source frames (learnt from annotated source data) via adversarial learning.
The second is intra-domain TCR that guides unconfident predictions of target frames to have similar temporal consistency as confident predictions of target frames.
arXiv Detail & Related papers (2021-07-23T02:50:42Z) - CUPID: Adaptive Curation of Pre-training Data for Video-and-Language
Representation Learning [49.18591896085498]
We propose CUPID to bridge the domain gap between source and target data.
CUPID yields new state-of-the-art performance across multiple video-language and video tasks.
arXiv Detail & Related papers (2021-04-01T06:42:16Z) - Contrastive Transformation for Self-supervised Correspondence Learning [120.62547360463923]
We study the self-supervised learning of visual correspondence using unlabeled videos in the wild.
Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence estimation.
Our framework outperforms the recent self-supervised correspondence methods on a range of visual tasks.
arXiv Detail & Related papers (2020-12-09T14:05:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.