Unsupervised Domain Adaptation for Video Transformers in Action
Recognition
- URL: http://arxiv.org/abs/2207.12842v1
- Date: Tue, 26 Jul 2022 12:17:39 GMT
- Title: Unsupervised Domain Adaptation for Video Transformers in Action
Recognition
- Authors: Victor G. Turrisi da Costa, Giacomo Zara, Paolo Rota, Thiago
Oliveira-Santos, Nicu Sebe, Vittorio Murino, Elisa Ricci
- Abstract summary: We propose a simple and novel UDA approach for video action recognition.
Our approach builds a robust source model that better generalises to target domain.
We report results on two video action benchmarks recognition for UDA.
- Score: 76.31442702219461
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over the last few years, Unsupervised Domain Adaptation (UDA) techniques have
acquired remarkable importance and popularity in computer vision. However, when
compared to the extensive literature available for images, the field of videos
is still relatively unexplored. On the other hand, the performance of a model
in action recognition is heavily affected by domain shift. In this paper, we
propose a simple and novel UDA approach for video action recognition. Our
approach leverages recent advances on spatio-temporal transformers to build a
robust source model that better generalises to the target domain. Furthermore,
our architecture learns domain invariant features thanks to the introduction of
a novel alignment loss term derived from the Information Bottleneck principle.
We report results on two video action recognition benchmarks for UDA, showing
state-of-the-art performance on HMDB$\leftrightarrow$UCF, as well as on
Kinetics$\rightarrow$NEC-Drone, which is more challenging. This demonstrates
the effectiveness of our method in handling different levels of domain shift.
The source code is available at https://github.com/vturrisi/UDAVT.
Related papers
- Transferable-guided Attention Is All You Need for Video Domain Adaptation [42.642008092347986]
Unsupervised adaptation (UDA) in videos is a challenging task that remains not well explored compared to image-based UDA techniques.
Our key idea is to use transformer layers as a feature encoder and incorporate spatial and temporal transferability relationships into the attention mechanism.
A Transferable-guided Attention (TransferAttn) framework is then developed to exploit the capacity of the transformer to adapt cross-domain knowledge.
arXiv Detail & Related papers (2024-07-01T15:29:27Z) - Vision Transformer-based Adversarial Domain Adaptation [5.611768906855499]
Vision transformer (ViT) has attracted tremendous attention since its emergence and has been widely used in various computer vision tasks.
In this paper, we fill this gap by employing the ViT as the feature extractor in adversarial domain adaptation.
We empirically demonstrate that ViT can be a plug-and-play component in adversarial domain adaptation.
arXiv Detail & Related papers (2024-04-24T11:41:28Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Unsupervised Video Domain Adaptation for Action Recognition: A
Disentanglement Perspective [37.45565756522847]
We consider the generation of cross-domain videos from two sets of latent factors.
TranSVAE framework is then developed to model such generation.
Experiments on the UCF-HMDB, Jester, and Epic-Kitchens datasets verify the effectiveness and superiority of TranSVAE.
arXiv Detail & Related papers (2022-08-15T17:59:31Z) - Learning Cross-modal Contrastive Features for Video Domain Adaptation [138.75196499580804]
We propose a unified framework for video domain adaptation, which simultaneously regularizes cross-modal and cross-domain feature representations.
Specifically, we treat each modality in a domain as a view and leverage the contrastive learning technique with properly designed sampling strategies.
arXiv Detail & Related papers (2021-08-26T18:14:18Z) - Unsupervised Domain Adaptation for Video Semantic Segmentation [91.30558794056054]
Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real.
In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic approaches.
We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
arXiv Detail & Related papers (2021-07-23T07:18:20Z) - JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion
Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks.
We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection.
We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z) - Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area.
Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos.
This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.