TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
- URL: http://arxiv.org/abs/2411.19041v1
- Date: Thu, 28 Nov 2024 10:38:05 GMT
- Title: TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
- Authors: Yilong Wang, Zilin Gao, Qilong Wang, Zhaofeng Chen, Peihua Li, Qinghua Hu,
- Abstract summary: Cross-domain action recognition (CDFSAR) has attracted recent research interests.
This paper proposes a simple yet effective baseline, namely Temporal-Aware Model Tuning (TAMT) for CDFSAR.
Our TAMT involves a decoupled paradigm by performing pre-training on source data and fine-tuning target data, which avoids retraining for multiple target data with single source.
- Score: 39.073835841717184
- License:
- Abstract: Going beyond few-shot action recognition (FSAR), cross-domain FSAR (CDFSAR) has attracted recent research interests by solving the domain gap lying in source-to-target transfer learning. Existing CDFSAR methods mainly focus on joint training of source and target data to mitigate the side effect of domain gap. However, such kind of methods suffer from two limitations: First, pair-wise joint training requires retraining deep models in case of one source data and multiple target ones, which incurs heavy computation cost, especially for large source and small target data. Second, pre-trained models after joint training are adopted to target domain in a straightforward manner, hardly taking full potential of pre-trained models and then limiting recognition performance. To overcome above limitations, this paper proposes a simple yet effective baseline, namely Temporal-Aware Model Tuning (TAMT) for CDFSAR. Specifically, our TAMT involves a decoupled paradigm by performing pre-training on source data and fine-tuning target data, which avoids retraining for multiple target data with single source. To effectively and efficiently explore the potential of pre-trained models in transferring to target domain, our TAMT proposes a Hierarchical Temporal Tuning Network (HTTN), whose core involves local temporal-aware adapters (TAA) and a global temporal-aware moment tuning (GTMT). Particularly, TAA learns few parameters to recalibrate the intermediate features of frozen pre-trained models, enabling efficient adaptation to target domains. Furthermore, GTMT helps to generate powerful video representations, improving match performance on the target domain. Experiments on several widely used video benchmarks show our TAMT outperforms the recently proposed counterparts by 13%$\sim$31%, achieving new state-of-the-art CDFSAR results.
Related papers
- Bridge then Begin Anew: Generating Target-relevant Intermediate Model for Source-free Visual Emotion Adaptation [22.638915084704344]
Visual emotion recognition (VER) aims at understanding humans' emotional reactions toward different visual stimuli.
domain adaptation offers an alternative solution by adapting models trained on labeled source data to unlabeled target data.
Due to privacy concerns, source emotional data may be inaccessible.
We propose a novel framework termed Bridge then Begin Anew (BBA), which consists of two steps: domain-bridged model generation (DMG) and target-related model adaptation (TMA)
arXiv Detail & Related papers (2024-12-18T07:51:35Z) - Enhancing Test Time Adaptation with Few-shot Guidance [35.13317598777832]
Deep neural networks often encounter significant performance drops while facing with domain shifts between training (source) and test (target) data.
Test Time Adaptation (TTA) methods have been proposed to adapt pre-trained source model to handle out-of-distribution streaming target data.
We develop Few-Shot Test Time Adaptation (FS-TTA), a novel and practical setting that utilizes a few-shot support set on top of TTA.
arXiv Detail & Related papers (2024-09-02T15:50:48Z) - Understanding the Cross-Domain Capabilities of Video-Based Few-Shot Action Recognition Models [3.072340427031969]
Few-shot action recognition (FSAR) aims to learn a model capable of identifying novel actions in videos using only a few examples.
In assuming the base dataset seen during meta-training and novel dataset used for evaluation can come from different domains, cross-domain few-shot learning alleviates data collection and annotation costs.
We systematically evaluate existing state-of-the-art single-domain, transfer-based, and cross-domain FSAR methods on new cross-domain tasks.
arXiv Detail & Related papers (2024-06-03T07:48:18Z) - DG-TTA: Out-of-domain medical image segmentation through Domain Generalization and Test-Time Adaptation [43.842694540544194]
We propose to combine domain generalization and test-time adaptation to create a highly effective approach for reusing pre-trained models in unseen target domains.
We demonstrate that our method, combined with pre-trained whole-body CT models, can effectively segment MR images with high accuracy.
arXiv Detail & Related papers (2023-12-11T10:26:21Z) - Subject-Based Domain Adaptation for Facial Expression Recognition [51.10374151948157]
Adapting a deep learning model to a specific target individual is a challenging facial expression recognition task.
This paper introduces a new MSDA method for subject-based domain adaptation in FER.
It efficiently leverages information from multiple source subjects to adapt a deep FER model to a single target individual.
arXiv Detail & Related papers (2023-12-09T18:40:37Z) - Adapting the Mean Teacher for keypoint-based lung registration under
geometric domain shifts [75.51482952586773]
deep neural networks generally require plenty of labeled training data and are vulnerable to domain shifts between training and test data.
We present a novel approach to geometric domain adaptation for image registration, adapting a model from a labeled source to an unlabeled target domain.
Our method consistently improves on the baseline model by 50%/47% while even matching the accuracy of models trained on target data.
arXiv Detail & Related papers (2022-07-01T12:16:42Z) - Stagewise Unsupervised Domain Adaptation with Adversarial Self-Training
for Road Segmentation of Remote Sensing Images [93.50240389540252]
Road segmentation from remote sensing images is a challenging task with wide ranges of application potentials.
We propose a novel stagewise domain adaptation model called RoadDA to address the domain shift (DS) issue in this field.
Experiment results on two benchmarks demonstrate that RoadDA can efficiently reduce the domain gap and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-08-28T09:29:14Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - A Transductive Multi-Head Model for Cross-Domain Few-Shot Learning [72.30054522048553]
We present a new method, Transductive Multi-Head Few-Shot learning (TMHFS), to address the Cross-Domain Few-Shot Learning challenge.
The proposed methods greatly outperform the strong baseline, fine-tuning, on four different target domains.
arXiv Detail & Related papers (2020-06-08T02:39:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.