Video Test-Time Adaptation for Action Recognition
- URL: http://arxiv.org/abs/2211.15393v3
- Date: Mon, 20 Mar 2023 23:48:02 GMT
- Title: Video Test-Time Adaptation for Action Recognition
- Authors: Wei Lin, Muhammad Jehanzeb Mirza, Mateusz Kozinski, Horst Possegger,
Hilde Kuehne, Horst Bischof
- Abstract summary: Action recognition systems are vulnerable to unanticipated distribution shifts in test data.
We propose a test-time adaptation of video action recognition models against common distribution shifts.
Our proposed method demonstrates a substantial performance gain over existing test-time adaptation approaches.
- Score: 24.596473019563398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although action recognition systems can achieve top performance when
evaluated on in-distribution test points, they are vulnerable to unanticipated
distribution shifts in test data. However, test-time adaptation of video action
recognition models against common distribution shifts has so far not been
demonstrated. We propose to address this problem with an approach tailored to
spatio-temporal models that is capable of adaptation on a single video sample
at a step. It consists in a feature distribution alignment technique that
aligns online estimates of test set statistics towards the training statistics.
We further enforce prediction consistency over temporally augmented views of
the same test video sample. Evaluations on three benchmark action recognition
datasets show that our proposed technique is architecture-agnostic and able to
significantly boost the performance on both, the state of the art convolutional
architecture TANet and the Video Swin Transformer. Our proposed method
demonstrates a substantial performance gain over existing test-time adaptation
approaches in both evaluations of a single distribution shift and the
challenging case of random distribution shifts. Code will be available at
\url{https://github.com/wlin-at/ViTTA}.
Related papers
- DOTA: Distributional Test-Time Adaptation of Vision-Language Models [52.98590762456236]
Training-free test-time dynamic adapter (TDA) is a promising approach to address this issue.
We propose a simple yet effective method for DistributiOnal Test-time Adaptation (Dota)
Dota continually estimates the distributions of test samples, allowing the model to continually adapt to the deployment environment.
arXiv Detail & Related papers (2024-09-28T15:03:28Z) - Protected Test-Time Adaptation via Online Entropy Matching: A Betting Approach [14.958884168060097]
We present a novel approach for test-time adaptation via online self-training.
Our approach combines concepts in betting martingales and online learning to form a detection tool capable of reacting to distribution shifts.
Experimental results demonstrate that our approach improves test-time accuracy under distribution shifts while maintaining accuracy and calibration in their absence.
arXiv Detail & Related papers (2024-08-14T12:40:57Z) - Test-time Distribution Learning Adapter for Cross-modal Visual Reasoning [16.998833621046117]
We propose Test-Time Distribution LearNing Adapter (TT-DNA) which directly works during the testing period.
Specifically, we estimate Gaussian distributions to model visual features of the few-shot support images to capture the knowledge from the support set.
Our extensive experimental results on visual reasoning for human object interaction demonstrate that our proposed TT-DNA outperforms existing state-of-the-art methods by large margins.
arXiv Detail & Related papers (2024-03-10T01:34:45Z) - Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a
Large Foundational Video Understanding Model [0.0]
This work explores the performance of a large video understanding foundation model on the downstream task of human fall detection on untrimmed video.
A method for temporal action localization that relies on a simple cutup of untrimmed videos is demonstrated.
The results are promising for real-time application, and the falls are detected on video level with a state-of-the-art 0.96 F1 score on the HQFSD dataset.
arXiv Detail & Related papers (2024-01-29T16:37:00Z) - Adversarial Augmentation Training Makes Action Recognition Models More
Robust to Realistic Video Distribution Shifts [13.752169303624147]
Action recognition models often lack robustness when faced with natural distribution shifts between training and test data.
We propose two novel evaluation methods to assess model resilience to such distribution disparity.
We experimentally demonstrate the superior performance of the proposed adversarial augmentation approach over baselines across three state-of-the-art action recognition models.
arXiv Detail & Related papers (2024-01-21T05:50:39Z) - Transform-Equivariant Consistency Learning for Temporal Sentence
Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video.
Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted.
In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - Listen, Adapt, Better WER: Source-free Single-utterance Test-time
Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples.
Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z) - Test-time Adaptation with Slot-Centric Models [63.981055778098444]
Slot-TTA is a semi-supervised scene decomposition model that at test time is adapted per scene through gradient descent on reconstruction or cross-view synthesis objectives.
We show substantial out-of-distribution performance improvements against state-of-the-art supervised feed-forward detectors, and alternative test-time adaptation methods.
arXiv Detail & Related papers (2022-03-21T17:59:50Z) - Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed
Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels.
Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework.
We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.