C3T: Cross-modal Transfer Through Time for Human Action Recognition
- URL: http://arxiv.org/abs/2407.16803v2
- Date: Thu, 7 Nov 2024 17:10:15 GMT
- Title: C3T: Cross-modal Transfer Through Time for Human Action Recognition
- Authors: Abhi Kamboj, Anh Duy Nguyen, Minh Do,
- Abstract summary: We formalize and explore an understudied cross-modal transfer setting we term Unsupervised Modality Adaptation (UMA)
We develop three methods to perform UMA: Student-Teacher (ST), Contrastive Alignment (CA), and Cross-modal Transfer Through Time (C3T)
The results indicate C3T is the most robust and highest performing by at least a margin of 8%, and nears the supervised setting performance even in the presence of temporal noise.
- Score: 0.8192907805418581
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In order to unlock the potential of diverse sensors, we investigate a method to transfer knowledge between modalities using the structure of a unified multimodal representation space for Human Action Recognition (HAR). We formalize and explore an understudied cross-modal transfer setting we term Unsupervised Modality Adaptation (UMA), where the modality used in testing is not used in supervised training, i.e. zero labeled instances of the test modality are available during training. We develop three methods to perform UMA: Student-Teacher (ST), Contrastive Alignment (CA), and Cross-modal Transfer Through Time (C3T). Our extensive experiments on various camera+IMU datasets compare these methods to each other in the UMA setting, and to their empirical upper bound in the supervised setting. The results indicate C3T is the most robust and highest performing by at least a margin of 8%, and nears the supervised setting performance even in the presence of temporal noise. This method introduces a novel mechanism for aligning signals across time-varying latent vectors, extracted from the receptive field of temporal convolutions. Our findings suggest that C3T has significant potential for developing generalizable models for time-series sensor data, opening new avenues for multi-modal learning in various applications.
Related papers
- OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation [67.56268991234371]
OV-Uni3DETR achieves the state-of-the-art performance on various scenarios, surpassing existing methods by more than 6% on average.
Code and pre-trained models will be released later.
arXiv Detail & Related papers (2024-03-28T17:05:04Z) - Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping [12.442574943138794]
The paper explores the industrial multimodal Anomaly Detection (AD) task, which exploits point clouds and RGB images to localize anomalies.
We introduce a novel light and fast framework that learns to map features from one modality to the other on nominal samples.
arXiv Detail & Related papers (2023-12-07T18:41:21Z) - FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level
Gradient Calibration [89.4165092674947]
Multi-modality fusion and multi-task learning are becoming trendy in 3D autonomous driving scenario.
Previous works manually coordinate the learning framework with empirical knowledge, which may lead to sub-optima.
We propose a novel yet simple multi-level gradient calibration learning framework across tasks and modalities during optimization.
arXiv Detail & Related papers (2023-07-31T12:50:15Z) - Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision.
This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z) - CMD: Self-supervised 3D Action Representation Learning with Cross-modal
Mutual Distillation [130.08432609780374]
In 3D action recognition, there exists rich complementary information between skeleton modalities.
We propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs.
Our approach outperforms existing self-supervised methods and sets a series of new records.
arXiv Detail & Related papers (2022-08-26T06:06:09Z) - SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video
Anomaly Detection [108.57862846523858]
We revisit the self-supervised multi-task learning framework, proposing several updates to the original method.
We modernize the 3D convolutional backbone by introducing multi-head self-attention modules.
In our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps.
arXiv Detail & Related papers (2022-07-16T19:25:41Z) - MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation [104.48766162008815]
We propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation.
To design a framework that can take full advantage of multi-modality, each modality provides regularized self-supervisory signals to other modalities.
Our regularized pseudo labels produce stable self-learning signals in numerous multi-modal test-time adaptation scenarios.
arXiv Detail & Related papers (2022-04-27T02:28:12Z) - Transfer Learning for Autonomous Chatter Detection in Machining [0.9281671380673306]
Large-amplitude chatter vibrations are one of the most important phenomena in machining processes.
Three challenges can be identified in applying machine learning for chatter detection at large in industry.
These three challenges can be grouped under the umbrella of transfer learning.
arXiv Detail & Related papers (2022-04-11T20:46:06Z) - 3DCFS: Fast and Robust Joint 3D Semantic-Instance Segmentation via
Coupled Feature Selection [46.922236354885]
We propose a novel 3D point clouds segmentation framework, named 3DCFS, that jointly performs semantic and instance segmentation.
Inspired by the human scene perception process, we design a novel coupled feature selection module, named CFSM, that adaptively selects and fuses the reciprocal semantic and instance features.
Our 3DCFS outperforms state-of-the-art methods on benchmark datasets in terms of accuracy, speed and computational cost.
arXiv Detail & Related papers (2020-03-01T17:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.