Domain Adaptive Video Semantic Segmentation via Cross-Domain Moving
Object Mixing
- URL: http://arxiv.org/abs/2211.02307v1
- Date: Fri, 4 Nov 2022 08:10:33 GMT
- Title: Domain Adaptive Video Semantic Segmentation via Cross-Domain Moving
Object Mixing
- Authors: Kyusik Cho, Suhyeon Lee, Hongje Seong and Euntai Kim
- Abstract summary: We propose Cross-domain Moving Object Mixing (CMOM) that cuts several objects, including hard-to-transfer classes, in the source domain video clip.
Unlike image-level domain adaptation, the temporal context should be maintained to mix moving objects in two different videos.
We additionally propose Feature Alignment with Temporal Context (FATC) to enhance target domain feature discriminability.
- Score: 15.823918683848877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The network trained for domain adaptation is prone to bias toward the
easy-to-transfer classes. Since the ground truth label on the target domain is
unavailable during training, the bias problem leads to skewed predictions,
forgetting to predict hard-to-transfer classes. To address this problem, we
propose Cross-domain Moving Object Mixing (CMOM) that cuts several objects,
including hard-to-transfer classes, in the source domain video clip and pastes
them into the target domain video clip. Unlike image-level domain adaptation,
the temporal context should be maintained to mix moving objects in two
different videos. Therefore, we design CMOM to mix with consecutive video
frames, so that unrealistic movements are not occurring. We additionally
propose Feature Alignment with Temporal Context (FATC) to enhance target domain
feature discriminability. FATC exploits the robust source domain features,
which are trained with ground truth labels, to learn discriminative target
domain features in an unsupervised manner by filtering unreliable predictions
with temporal consensus. We demonstrate the effectiveness of the proposed
approaches through extensive experiments. In particular, our model reaches mIoU
of 53.81% on VIPER to Cityscapes-Seq benchmark and mIoU of 56.31% on
SYNTHIA-Seq to Cityscapes-Seq benchmark, surpassing the state-of-the-art
methods by large margins.
Related papers
- RB-FT: Rationale-Bootstrapped Fine-Tuning for Video Classification [14.224783616912783]
Vision Language Models (VLMs) are becoming increasingly integral to multimedia understanding.<n>They often struggle with domain-specific video classification tasks, particularly with limited data.<n>We propose two-stage self-improvement paradigm to bridge this gap without new annotations.
arXiv Detail & Related papers (2025-11-19T23:12:18Z) - Uncertainty-quantified Rollout Policy Adaptation for Unlabelled Cross-domain Temporal Grounding [59.09971455857609]
Video Temporal Grounding aims to temporally locate video segments matching a natural language description in a long video.<n>We introduce a Data-Efficient Unlabelled Cross-domain Temporal Grounding method.<n>This method eliminates the need for target annotation and keeps both computational and storage overhead low enough to run in real time.
arXiv Detail & Related papers (2025-08-08T13:47:00Z) - Continual Unsupervised Domain Adaptation for Semantic Segmentation using
a Class-Specific Transfer [9.46677024179954]
segmentation models do not generalize to unseen domains.
We propose a light-weight style transfer framework that incorporates two class-conditional AdaIN layers.
We extensively validate our approach on a synthetic sequence and further propose a challenging sequence consisting of real domains.
arXiv Detail & Related papers (2022-08-12T21:30:49Z) - Contrast and Mix: Temporal Contrastive Video Domain Adaptation with
Background Mixing [55.73722120043086]
We introduce Contrast and Mix (CoMix), a new contrastive learning framework that aims to learn discriminative invariant feature representations for unsupervised video domain adaptation.
First, we utilize temporal contrastive learning to bridge the domain gap by maximizing the similarity between encoded representations of an unlabeled video at two different speeds.
Second, we propose a novel extension to the temporal contrastive loss by using background mixing that allows additional positives per anchor, thus adapting contrastive learning to leverage action semantics shared across both domains.
arXiv Detail & Related papers (2021-10-28T14:03:29Z) - Stagewise Unsupervised Domain Adaptation with Adversarial Self-Training
for Road Segmentation of Remote Sensing Images [93.50240389540252]
Road segmentation from remote sensing images is a challenging task with wide ranges of application potentials.
We propose a novel stagewise domain adaptation model called RoadDA to address the domain shift (DS) issue in this field.
Experiment results on two benchmarks demonstrate that RoadDA can efficiently reduce the domain gap and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-08-28T09:29:14Z) - Learning Cross-modal Contrastive Features for Video Domain Adaptation [138.75196499580804]
We propose a unified framework for video domain adaptation, which simultaneously regularizes cross-modal and cross-domain feature representations.
Specifically, we treat each modality in a domain as a view and leverage the contrastive learning technique with properly designed sampling strategies.
arXiv Detail & Related papers (2021-08-26T18:14:18Z) - Contrastive Learning and Self-Training for Unsupervised Domain
Adaptation in Semantic Segmentation [71.77083272602525]
UDA attempts to provide efficient knowledge transfer from a labeled source domain to an unlabeled target domain.
We propose a contrastive learning approach that adapts category-wise centroids across domains.
We extend our method with self-training, where we use a memory-efficient temporal ensemble to generate consistent and reliable pseudo-labels.
arXiv Detail & Related papers (2021-05-05T11:55:53Z) - Surprisingly Simple Semi-Supervised Domain Adaptation with Pretraining
and Consistency [93.89773386634717]
Visual domain adaptation involves learning to classify images from a target visual domain using labels available in a different source domain.
We show that in the presence of a few target labels, simple techniques like self-supervision (via rotation prediction) and consistency regularization can be effective without any adversarial alignment to learn a good target classifier.
Our Pretraining and Consistency (PAC) approach, can achieve state of the art accuracy on this semi-supervised domain adaptation task, surpassing multiple adversarial domain alignment methods, across multiple datasets.
arXiv Detail & Related papers (2021-01-29T18:40:17Z) - Pixel-Level Cycle Association: A New Perspective for Domain Adaptive
Semantic Segmentation [169.82760468633236]
We propose to build the pixel-level cycle association between source and target pixel pairs.
Our method can be trained end-to-end in one stage and introduces no additional parameters.
arXiv Detail & Related papers (2020-10-31T00:11:36Z) - Unsupervised Domain Adaptive Object Detection using Forward-Backward
Cyclic Adaptation [13.163271874039191]
We present a novel approach to perform the unsupervised domain adaptation for object detection through forward-backward cyclic (FBC) training.
Recent adversarial training based domain adaptation methods have shown their effectiveness on minimizing domain discrepancy via marginal feature distributions alignment.
We propose Forward-Backward Cyclic Adaptation, which iteratively computes adaptation from source to target via backward hopping and from target to source via forward passing.
arXiv Detail & Related papers (2020-02-03T06:24:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.