Adversarial Augmentation Training Makes Action Recognition Models More
Robust to Realistic Video Distribution Shifts
- URL: http://arxiv.org/abs/2401.11406v1
- Date: Sun, 21 Jan 2024 05:50:39 GMT
- Title: Adversarial Augmentation Training Makes Action Recognition Models More
Robust to Realistic Video Distribution Shifts
- Authors: Kiyoon Kim, Shreyank N Gowda, Panagiotis Eustratiadis, Antreas
Antoniou, Robert B Fisher
- Abstract summary: Action recognition models often lack robustness when faced with natural distribution shifts between training and test data.
We propose two novel evaluation methods to assess model resilience to such distribution disparity.
We experimentally demonstrate the superior performance of the proposed adversarial augmentation approach over baselines across three state-of-the-art action recognition models.
- Score: 13.752169303624147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite recent advances in video action recognition achieving strong
performance on existing benchmarks, these models often lack robustness when
faced with natural distribution shifts between training and test data. We
propose two novel evaluation methods to assess model resilience to such
distribution disparity. One method uses two different datasets collected from
different sources and uses one for training and validation, and the other for
testing. More precisely, we created dataset splits of HMDB-51 or UCF-101 for
training, and Kinetics-400 for testing, using the subset of the classes that
are overlapping in both train and test datasets. The other proposed method
extracts the feature mean of each class from the target evaluation dataset's
training data (i.e. class prototype) and estimates test video prediction as a
cosine similarity score between each sample to the class prototypes of each
target class. This procedure does not alter model weights using the target
dataset and it does not require aligning overlapping classes of two different
datasets, thus is a very efficient method to test the model robustness to
distribution shifts without prior knowledge of the target distribution. We
address the robustness problem by adversarial augmentation training -
generating augmented views of videos that are "hard" for the classification
model by applying gradient ascent on the augmentation parameters - as well as
"curriculum" scheduling the strength of the video augmentations. We
experimentally demonstrate the superior performance of the proposed adversarial
augmentation approach over baselines across three state-of-the-art action
recognition models - TSM, Video Swin Transformer, and Uniformer. The presented
work provides critical insight into model robustness to distribution shifts and
presents effective techniques to enhance video action recognition performance
in a real-world deployment.
Related papers
- Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - Universal Semi-supervised Model Adaptation via Collaborative Consistency
Training [92.52892510093037]
We introduce a realistic and challenging domain adaptation problem called Universal Semi-supervised Model Adaptation (USMA)
We propose a collaborative consistency training framework that regularizes the prediction consistency between two models.
Experimental results demonstrate the effectiveness of our method on several benchmark datasets.
arXiv Detail & Related papers (2023-07-07T08:19:40Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Texture-Based Input Feature Selection for Action Recognition [3.9596068699962323]
We propose a novel method to determine the task-irrelevant content in inputs which increases the domain discrepancy.
We show that our proposed model is superior to existing models for action recognition on the HMDB-51 dataset and the Penn Action dataset.
arXiv Detail & Related papers (2023-02-28T23:56:31Z) - Effective Robustness against Natural Distribution Shifts for Models with
Different Training Data [113.21868839569]
"Effective robustness" measures the extra out-of-distribution robustness beyond what can be predicted from the in-distribution (ID) performance.
We propose a new evaluation metric to evaluate and compare the effective robustness of models trained on different data.
arXiv Detail & Related papers (2023-02-02T19:28:41Z) - Video Test-Time Adaptation for Action Recognition [24.596473019563398]
Action recognition systems are vulnerable to unanticipated distribution shifts in test data.
We propose a test-time adaptation of video action recognition models against common distribution shifts.
Our proposed method demonstrates a substantial performance gain over existing test-time adaptation approaches.
arXiv Detail & Related papers (2022-11-24T10:49:54Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - Wavelet-Based Hybrid Machine Learning Model for Out-of-distribution
Internet Traffic Prediction [3.689539481706835]
This paper investigates machine learning performances using eXtreme Gradient Boosting, Light Gradient Boosting Machine, Gradient Descent, Gradient Boosting Regressor, Cat Regressor.
We propose a hybrid machine learning model integrating wavelet decomposition for improving out-of-distribution prediction.
arXiv Detail & Related papers (2022-05-09T14:34:42Z) - Few Shot Activity Recognition Using Variational Inference [9.371378627575883]
We propose a novel variational inference based architectural framework (HF-AR) for few shot activity recognition.
Our framework leverages volume-preserving Householder Flow to learn a flexible posterior distribution of the novel classes.
This results in better performance as compared to state-of-the-art few shot approaches for human activity recognition.
arXiv Detail & Related papers (2021-08-20T03:57:58Z) - Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area.
Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos.
This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.