Unsupervised Domain Adaptation for Video Semantic Segmentation
- URL: http://arxiv.org/abs/2107.11052v1
- Date: Fri, 23 Jul 2021 07:18:20 GMT
- Title: Unsupervised Domain Adaptation for Video Semantic Segmentation
- Authors: Inkyu Shin, Kwanyong Park, Sanghyun Woo, In So Kweon
- Abstract summary: Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real.
In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic approaches.
We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
- Score: 91.30558794056054
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised Domain Adaptation for semantic segmentation has gained immense
popularity since it can transfer knowledge from simulation to real (Sim2Real)
by largely cutting out the laborious per pixel labeling efforts at real. In
this work, we present a new video extension of this task, namely Unsupervised
Domain Adaptation for Video Semantic Segmentation. As it became easy to obtain
large-scale video labels through simulation, we believe attempting to maximize
Sim2Real knowledge transferability is one of the promising directions for
resolving the fundamental data-hungry issue in the video. To tackle this new
problem, we present a novel two-phase adaptation scheme. In the first step, we
exhaustively distill source domain knowledge using supervised loss functions.
Simultaneously, video adversarial training (VAT) is employed to align the
features from source to target utilizing video context. In the second step, we
apply video self-training (VST), focusing only on the target data. To construct
robust pseudo labels, we exploit the temporal information in the video, which
has been rarely explored in the previous image-based self-training approaches.
We set strong baseline scores on 'VIPER to CityscapeVPS' adaptation scenario.
We show that our proposals significantly outperform previous image-based UDA
methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
Related papers
- CDFSL-V: Cross-Domain Few-Shot Learning for Videos [58.37446811360741]
Few-shot video action recognition is an effective approach to recognizing new categories with only a few labeled examples.
Existing methods in video action recognition rely on large labeled datasets from the same domain.
We propose a novel cross-domain few-shot video action recognition method that leverages self-supervised learning and curriculum learning.
arXiv Detail & Related papers (2023-09-07T19:44:27Z) - Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey [42.22801056661226]
Video analysis tasks such as action recognition have received increasing research interest with growing applications in fields such as smart healthcare.
Video models trained on existing datasets suffer from significant performance degradation when deployed directly to real-world applications.
Video unsupervised domain adaptation (VUDA) is introduced to adapt video models from the labeled source domain to the unlabeled target domain.
arXiv Detail & Related papers (2022-11-17T05:05:42Z) - Unsupervised Domain Adaptation for Video Transformers in Action
Recognition [76.31442702219461]
We propose a simple and novel UDA approach for video action recognition.
Our approach builds a robust source model that better generalises to target domain.
We report results on two video action benchmarks recognition for UDA.
arXiv Detail & Related papers (2022-07-26T12:17:39Z) - CycDA: Unsupervised Cycle Domain Adaptation from Image to Video [26.30914383638721]
Domain Cycle Adaptation (CycDA) is a cycle-based approach for unsupervised image-to-video domain adaptation.
We evaluate our approach on benchmark datasets for image-to-video and for mixed-source domain adaptation.
arXiv Detail & Related papers (2022-03-30T12:22:26Z) - CUPID: Adaptive Curation of Pre-training Data for Video-and-Language
Representation Learning [49.18591896085498]
We propose CUPID to bridge the domain gap between source and target data.
CUPID yields new state-of-the-art performance across multiple video-language and video tasks.
arXiv Detail & Related papers (2021-04-01T06:42:16Z) - Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area.
Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos.
This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z) - Unsupervised Learning of Video Representations via Dense Trajectory
Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation.
We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.