CycDA: Unsupervised Cycle Domain Adaptation from Image to Video
- URL: http://arxiv.org/abs/2203.16244v3
- Date: Wed, 22 Mar 2023 11:19:19 GMT
- Title: CycDA: Unsupervised Cycle Domain Adaptation from Image to Video
- Authors: Wei Lin, Anna Kukleva, Kunyang Sun, Horst Possegger, Hilde Kuehne,
Horst Bischof
- Abstract summary: Domain Cycle Adaptation (CycDA) is a cycle-based approach for unsupervised image-to-video domain adaptation.
We evaluate our approach on benchmark datasets for image-to-video and for mixed-source domain adaptation.
- Score: 26.30914383638721
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although action recognition has achieved impressive results over recent
years, both collection and annotation of video training data are still
time-consuming and cost intensive. Therefore, image-to-video adaptation has
been proposed to exploit labeling-free web image source for adapting on
unlabeled target videos. This poses two major challenges: (1) spatial domain
shift between web images and video frames; (2) modality gap between image and
video data. To address these challenges, we propose Cycle Domain Adaptation
(CycDA), a cycle-based approach for unsupervised image-to-video domain
adaptation by leveraging the joint spatial information in images and videos on
the one hand and, on the other hand, training an independent spatio-temporal
model to bridge the modality gap. We alternate between the spatial and
spatio-temporal learning with knowledge transfer between the two in each cycle.
We evaluate our approach on benchmark datasets for image-to-video as well as
for mixed-source domain adaptation achieving state-of-the-art results and
demonstrating the benefits of our cyclic adaptation. Code is available at
\url{https://github.com/wlin-at/CycDA}.
Related papers
- Time Does Tell: Self-Supervised Time-Tuning of Dense Image
Representations [79.87044240860466]
We propose a novel approach that incorporates temporal consistency in dense self-supervised learning.
Our approach, which we call time-tuning, starts from image-pretrained models and fine-tunes them with a novel self-supervised temporal-alignment clustering loss on unlabeled videos.
Time-tuning improves the state-of-the-art by 8-10% for unsupervised semantic segmentation on videos and matches it for images.
arXiv Detail & Related papers (2023-08-22T21:28:58Z) - Transform-Equivariant Consistency Learning for Temporal Sentence
Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video.
Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted.
In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z) - Unsupervised Video Domain Adaptation for Action Recognition: A
Disentanglement Perspective [37.45565756522847]
We consider the generation of cross-domain videos from two sets of latent factors.
TranSVAE framework is then developed to model such generation.
Experiments on the UCF-HMDB, Jester, and Epic-Kitchens datasets verify the effectiveness and superiority of TranSVAE.
arXiv Detail & Related papers (2022-08-15T17:59:31Z) - Contrast and Mix: Temporal Contrastive Video Domain Adaptation with
Background Mixing [55.73722120043086]
We introduce Contrast and Mix (CoMix), a new contrastive learning framework that aims to learn discriminative invariant feature representations for unsupervised video domain adaptation.
First, we utilize temporal contrastive learning to bridge the domain gap by maximizing the similarity between encoded representations of an unlabeled video at two different speeds.
Second, we propose a novel extension to the temporal contrastive loss by using background mixing that allows additional positives per anchor, thus adapting contrastive learning to leverage action semantics shared across both domains.
arXiv Detail & Related papers (2021-10-28T14:03:29Z) - Unsupervised Domain Adaptation for Video Semantic Segmentation [91.30558794056054]
Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real.
In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic approaches.
We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
arXiv Detail & Related papers (2021-07-23T07:18:20Z) - Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [80.7397409377659]
We propose an end-to-end trainable model that is designed to take advantage of both large-scale image and video captioning datasets.
Our model is flexible and can be trained on both image and video text datasets, either independently or in conjunction.
We show that this approach yields state-of-the-art results on standard downstream video-retrieval benchmarks.
arXiv Detail & Related papers (2021-04-01T17:48:27Z) - Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area.
Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos.
This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z) - Dual-Triplet Metric Learning for Unsupervised Domain Adaptation in
Video-Based Face Recognition [8.220945563455848]
A new deep domain adaptation (DA) method is proposed to adapt the CNN embedding of a Siamese network using unlabeled tracklets captured with a new video cameras.
The proposed metric learning technique is used to train deep Siamese networks under different training scenarios.
arXiv Detail & Related papers (2020-02-11T05:06:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.