Augmenting and Aligning Snippets for Few-Shot Video Domain Adaptation
- URL: http://arxiv.org/abs/2303.10451v1
- Date: Sat, 18 Mar 2023 16:33:56 GMT
- Title: Augmenting and Aligning Snippets for Few-Shot Video Domain Adaptation
- Authors: Yuecong Xu, Jianfei Yang, Yunjiao Zhou, Zhenghua Chen, Min Wu, Xiaoli
Li
- Abstract summary: Video Unsupervised Domain Adaptation (VUDA) has been introduced to improve the robustness and transferability of video models.
We consider a more realistic textitFew-Shot Video-based Domain Adaptation (FSVDA) scenario where we adapt video models with only a few target video samples.
We propose a novel SSA2lign to address FSVDA at the snippet level, where the target domain is expanded through a simple snippet-level augmentation.
- Score: 22.097165083633175
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: For video models to be transferred and applied seamlessly across video tasks
in varied environments, Video Unsupervised Domain Adaptation (VUDA) has been
introduced to improve the robustness and transferability of video models.
However, current VUDA methods rely on a vast amount of high-quality unlabeled
target data, which may not be available in real-world cases. We thus consider a
more realistic \textit{Few-Shot Video-based Domain Adaptation} (FSVDA) scenario
where we adapt video models with only a few target video samples. While a few
methods have touched upon Few-Shot Domain Adaptation (FSDA) in images and in
FSVDA, they rely primarily on spatial augmentation for target domain expansion
with alignment performed statistically at the instance level. However, videos
contain more knowledge in terms of rich temporal and semantic information,
which should be fully considered while augmenting target domains and performing
alignment in FSVDA. We propose a novel SSA2lign to address FSVDA at the snippet
level, where the target domain is expanded through a simple snippet-level
augmentation followed by the attentive alignment of snippets both semantically
and statistically, where semantic alignment of snippets is conducted through
multiple perspectives. Empirical results demonstrate state-of-the-art
performance of SSA2lign across multiple cross-domain action recognition
benchmarks.
Related papers
- Generative Video Diffusion for Unseen Cross-Domain Video Moment
Retrieval [58.17315970207874]
Video Moment Retrieval (VMR) requires precise modelling of fine-grained moment-text associations to capture intricate visual-language relationships.
Existing methods resort to joint training on both source and target domain videos for cross-domain applications.
We explore generative video diffusion for fine-grained editing of source videos controlled by the target sentences.
arXiv Detail & Related papers (2024-01-24T09:45:40Z) - Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video
Grounding [59.599378814835205]
Temporal Video Grounding (TVG) aims to localize the temporal boundary of a specific segment in an untrimmed video based on a given language query.
We introduce a novel AMDA method to adaptively adjust the model's scene-related knowledge by incorporating insights from the target data.
arXiv Detail & Related papers (2023-12-21T07:49:27Z) - Spatio-Temporal Pixel-Level Contrastive Learning-based Source-Free
Domain Adaptation for Video Semantic Segmentation [117.39092621796753]
Source Domain Adaptation (SFDA) setup aims to adapt a source-trained model to the target domain without accessing source data.
A novel method that takes full advantage of correlations oftemporal-information to tackle the absence of source data is proposed.
Experiments show that PixelL achieves un-of-the-art performance on benchmarks compared to current UDA and SFDA approaches.
arXiv Detail & Related papers (2023-03-25T05:06:23Z) - Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey [42.22801056661226]
Video analysis tasks such as action recognition have received increasing research interest with growing applications in fields such as smart healthcare.
Video models trained on existing datasets suffer from significant performance degradation when deployed directly to real-world applications.
Video unsupervised domain adaptation (VUDA) is introduced to adapt video models from the labeled source domain to the unlabeled target domain.
arXiv Detail & Related papers (2022-11-17T05:05:42Z) - Unsupervised Video Domain Adaptation for Action Recognition: A
Disentanglement Perspective [37.45565756522847]
We consider the generation of cross-domain videos from two sets of latent factors.
TranSVAE framework is then developed to model such generation.
Experiments on the UCF-HMDB, Jester, and Epic-Kitchens datasets verify the effectiveness and superiority of TranSVAE.
arXiv Detail & Related papers (2022-08-15T17:59:31Z) - Unsupervised Domain Adaptation for Video Transformers in Action
Recognition [76.31442702219461]
We propose a simple and novel UDA approach for video action recognition.
Our approach builds a robust source model that better generalises to target domain.
We report results on two video action benchmarks recognition for UDA.
arXiv Detail & Related papers (2022-07-26T12:17:39Z) - Learning Temporal Consistency for Source-Free Video Domain Adaptation [16.230405375192262]
In real-world applications, subjects and scenes in the source video domain should be irrelevant to those in the target video domain.
To cope with such concern, a more practical domain adaptation scenario is formulated as the Source-Free Video-based Domain Adaptation (SFVDA)
We propose a novel Attentive Temporal Consistent Network (ATCoN) to address SFVDA by learning temporal consistency.
arXiv Detail & Related papers (2022-03-09T07:33:36Z) - Unsupervised Domain Adaptation for Video Semantic Segmentation [91.30558794056054]
Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real.
In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic approaches.
We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
arXiv Detail & Related papers (2021-07-23T07:18:20Z) - Semi-Supervised Domain Adaptation via Adaptive and Progressive Feature
Alignment [32.77436219094282]
SSDAS employs a few labeled target samples as anchors for adaptive and progressive feature alignment between labeled source samples and unlabeled target samples.
In addition, we replace the dissimilar source features by high-confidence target features continuously during the iterative training process.
Extensive experiments show the proposed SSDAS greatly outperforms a number of baselines.
arXiv Detail & Related papers (2021-06-05T09:12:50Z) - Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area.
Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos.
This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.