Domain Adaptive Video Segmentation via Temporal Pseudo Supervision
        - URL: http://arxiv.org/abs/2207.02372v1
- Date: Wed, 6 Jul 2022 00:36:14 GMT
- Title: Domain Adaptive Video Segmentation via Temporal Pseudo Supervision
- Authors: Yun Xing, Dayan Guan, Jiaxing Huang, Shijian Lu
- Abstract summary: Video semantic segmentation can mitigate data labelling constraints by adapting from a labelled source domain toward an unlabelled target domain.
We design temporal pseudo supervision (TPS), a simple and effective method that explores the idea of consistency training for representations effective from target videos.
We show that TPS is simpler to implement, much more stable to train, and achieves superior video accuracy as compared with the state-of-the-art.
- Score: 46.38660541271893
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract:   Video semantic segmentation has achieved great progress under the supervision
of large amounts of labelled training data. However, domain adaptive video
segmentation, which can mitigate data labelling constraints by adapting from a
labelled source domain toward an unlabelled target domain, is largely
neglected. We design temporal pseudo supervision (TPS), a simple and effective
method that explores the idea of consistency training for learning effective
representations from unlabelled target videos. Unlike traditional consistency
training that builds consistency in spatial space, we explore consistency
training in spatiotemporal space by enforcing model consistency across
augmented video frames which helps learn from more diverse target data.
Specifically, we design cross-frame pseudo labelling to provide pseudo
supervision from previous video frames while learning from the augmented
current video frames. The cross-frame pseudo labelling encourages the network
to produce high-certainty predictions, which facilitates consistency training
with cross-frame augmentation effectively. Extensive experiments over multiple
public datasets show that TPS is simpler to implement, much more stable to
train, and achieves superior video segmentation accuracy as compared with the
state-of-the-art.
 
      
        Related papers
        - High Temporal Consistency through Semantic Similarity Propagation in   Semi-Supervised Video Semantic Segmentation for Autonomous Flight [0.9012198585960443]
 We propose a lightweight video semantic segmentation approach-suited to onboard real-time inference-achieving high temporal consistency on aerial data.
SSP temporally propagates the predictions of an efficient image segmentation model with global registration alignment to compensate for camera movements.
It provides a superior segmentation quality and inference speed trade-off than other video methods proposed for general applications.
 arXiv  Detail & Related papers  (2025-03-19T20:12:07Z)
- SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations [12.139451002212063]
 SSVOD exploits motion dynamics of videos to utilize large-scale unlabeled frames with sparse annotations.
Our method achieves significant performance improvements over existing methods on ImageNet-VID, Epic-KITCHENS, and YouTube-VIS.
 arXiv  Detail & Related papers  (2023-09-04T06:41:33Z)
- Transform-Equivariant Consistency Learning for Temporal Sentence
  Grounding [66.10949751429781]
 We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video.
Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted.
In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
 arXiv  Detail & Related papers  (2023-05-06T19:29:28Z)
- Video Annotation for Visual Tracking via Selection and Refinement [74.08109740917122]
 We present a new framework to facilitate bounding box annotations for video sequences.
A temporal assessment network is proposed which is able to capture the temporal coherence of target locations.
A visual-geometry refinement network is also designed to further enhance the selected tracking results.
 arXiv  Detail & Related papers  (2021-08-09T05:56:47Z)
- Unsupervised Domain Adaptation for Video Semantic Segmentation [91.30558794056054]
 Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real.
In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic approaches.
We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
 arXiv  Detail & Related papers  (2021-07-23T07:18:20Z)
- Domain Adaptive Video Segmentation via Temporal Consistency
  Regularization [32.77436219094282]
 This paper presents DA-VSN, a domain adaptive video segmentation network that addresses domain gaps in videos by temporal consistency regularization (TCR)
The first is cross-domain TCR that guides the prediction of target frames to have similar temporal consistency as that of source frames (learnt from annotated source data) via adversarial learning.
The second is intra-domain TCR that guides unconfident predictions of target frames to have similar temporal consistency as confident predictions of target frames.
 arXiv  Detail & Related papers  (2021-07-23T02:50:42Z)
- CUPID: Adaptive Curation of Pre-training Data for Video-and-Language
  Representation Learning [49.18591896085498]
 We propose CUPID to bridge the domain gap between source and target data.
 CUPID yields new state-of-the-art performance across multiple video-language and video tasks.
 arXiv  Detail & Related papers  (2021-04-01T06:42:16Z)
- Contrastive Transformation for Self-supervised Correspondence Learning [120.62547360463923]
 We study the self-supervised learning of visual correspondence using unlabeled videos in the wild.
Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence estimation.
Our framework outperforms the recent self-supervised correspondence methods on a range of visual tasks.
 arXiv  Detail & Related papers  (2020-12-09T14:05:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.