C3T: Cross-modal Transfer Through Time for Sensor-based Human Activity Recognition
- URL: http://arxiv.org/abs/2407.16803v3
- Date: Mon, 09 Jun 2025 15:03:39 GMT
- Title: C3T: Cross-modal Transfer Through Time for Sensor-based Human Activity Recognition
- Authors: Abhi Kamboj, Anh Duy Nguyen, Minh N. Do,
- Abstract summary: We introduce Cross-modal Transfer Through Time (C3T)<n>C3T preserves temporal information during alignment to handle dynamic sensor data better.<n>Our experiments on various camera+IMU datasets demonstrate that C3T outperforms existing methods in UMA by at least 8% in accuracy.
- Score: 7.139150172150715
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In order to unlock the potential of diverse sensors, we investigate a method to transfer knowledge between time-series modalities using a multimodal \textit{temporal} representation space for Human Activity Recognition (HAR). Specifically, we explore the setting where the modality used in testing has no labeled data during training, which we refer to as Unsupervised Modality Adaptation (UMA). We categorize existing UMA approaches as Student-Teacher or Contrastive Alignment methods. These methods typically compress continuous-time data samples into single latent vectors during alignment, inhibiting their ability to transfer temporal information through real-world temporal distortions. To address this, we introduce Cross-modal Transfer Through Time (C3T), which preserves temporal information during alignment to handle dynamic sensor data better. C3T achieves this by aligning a set of temporal latent vectors across sensing modalities. Our extensive experiments on various camera+IMU datasets demonstrate that C3T outperforms existing methods in UMA by at least 8% in accuracy and shows superior robustness to temporal distortions such as time-shift, misalignment, and dilation. Our findings suggest that C3T has significant potential for developing generalizable models for time-series sensor data, opening new avenues for various multimodal applications.
Related papers
- FreRA: A Frequency-Refined Augmentation for Contrastive Learning on Time Series Classification [56.925103708982164]
We present a novel perspective from the frequency domain and identify three advantages for downstream classification: global, independent, and compact.<n>We propose the lightweight yet effective Frequency Refined Augmentation (FreRA) tailored for time series contrastive learning on classification tasks.<n>FreRA consistently outperforms ten leading baselines on time series classification, anomaly detection, and transfer learning tasks.
arXiv Detail & Related papers (2025-05-29T07:18:28Z) - Decomposing and Fusing Intra- and Inter-Sensor Spatio-Temporal Signal for Multi-Sensor Wearable Human Activity Recognition [12.359681612030682]
We propose the DecomposeWHAR model to better model the relationships between modality variables.<n>The decomposition creates high-dimensional representations of each intra-sensor variable through the improved Depth Separable Convolution.<n>Our model demonstrates superior performance on three widely used WHAR datasets, significantly outperforming state-of-the-art models.
arXiv Detail & Related papers (2025-01-19T01:52:28Z) - SONNET: Enhancing Time Delay Estimation by Leveraging Simulated Audio [17.811771707446926]
We show that learning based methods can, even based on synthetic data, significantly outperform GCC-PHAT on novel real world data.
We provide our trained model, SONNET, which is runnable in real-time and works on novel data out of the box for many real data applications.
arXiv Detail & Related papers (2024-11-20T10:23:21Z) - OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation [67.56268991234371]
OV-Uni3DETR achieves the state-of-the-art performance on various scenarios, surpassing existing methods by more than 6% on average.
Code and pre-trained models will be released later.
arXiv Detail & Related papers (2024-03-28T17:05:04Z) - Cross-Cluster Shifting for Efficient and Effective 3D Object Detection
in Autonomous Driving [69.20604395205248]
We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving.
We introduce an intriguing Cross-Cluster Shifting operation to unleash the representation capacity of the point-based detector.
We conduct extensive experiments on the KITTI, runtime, and nuScenes datasets, and the results demonstrate the state-of-the-art performance of Shift-SSD.
arXiv Detail & Related papers (2024-03-10T10:36:32Z) - Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping [12.442574943138794]
The paper explores the industrial multimodal Anomaly Detection (AD) task, which exploits point clouds and RGB images to localize anomalies.
We introduce a novel light and fast framework that learns to map features from one modality to the other on nominal samples.
arXiv Detail & Related papers (2023-12-07T18:41:21Z) - Graph-Aware Contrasting for Multivariate Time-Series Classification [50.84488941336865]
Existing contrastive learning methods mainly focus on achieving temporal consistency with temporal augmentation and contrasting techniques.
We propose Graph-Aware Contrasting for spatial consistency across MTS data.
Our proposed method achieves state-of-the-art performance on various MTS classification tasks.
arXiv Detail & Related papers (2023-09-11T02:35:22Z) - FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level
Gradient Calibration [89.4165092674947]
Multi-modality fusion and multi-task learning are becoming trendy in 3D autonomous driving scenario.
Previous works manually coordinate the learning framework with empirical knowledge, which may lead to sub-optima.
We propose a novel yet simple multi-level gradient calibration learning framework across tasks and modalities during optimization.
arXiv Detail & Related papers (2023-07-31T12:50:15Z) - Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision.
This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z) - CMD: Self-supervised 3D Action Representation Learning with Cross-modal
Mutual Distillation [130.08432609780374]
In 3D action recognition, there exists rich complementary information between skeleton modalities.
We propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs.
Our approach outperforms existing self-supervised methods and sets a series of new records.
arXiv Detail & Related papers (2022-08-26T06:06:09Z) - SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video
Anomaly Detection [108.57862846523858]
We revisit the self-supervised multi-task learning framework, proposing several updates to the original method.
We modernize the 3D convolutional backbone by introducing multi-head self-attention modules.
In our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps.
arXiv Detail & Related papers (2022-07-16T19:25:41Z) - MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation [104.48766162008815]
We propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation.
To design a framework that can take full advantage of multi-modality, each modality provides regularized self-supervisory signals to other modalities.
Our regularized pseudo labels produce stable self-learning signals in numerous multi-modal test-time adaptation scenarios.
arXiv Detail & Related papers (2022-04-27T02:28:12Z) - Transfer Learning for Autonomous Chatter Detection in Machining [0.9281671380673306]
Large-amplitude chatter vibrations are one of the most important phenomena in machining processes.
Three challenges can be identified in applying machine learning for chatter detection at large in industry.
These three challenges can be grouped under the umbrella of transfer learning.
arXiv Detail & Related papers (2022-04-11T20:46:06Z) - Averaging Spatio-temporal Signals using Optimal Transport and Soft
Alignments [110.79706180350507]
We show that our proposed loss can be used to define temporal-temporal baryechecenters as Fr'teche means duality.
Experiments on handwritten letters and brain imaging data confirm our theoretical findings.
arXiv Detail & Related papers (2022-03-11T09:46:22Z) - Contrastive predictive coding for Anomaly Detection in Multi-variate
Time Series Data [6.463941665276371]
We propose a Time-series Representational Learning through Contrastive Predictive Coding (TRL-CPC) towards anomaly detection in MVTS data.
First, we jointly optimize an encoder, an auto-regressor and a non-linear transformation function to effectively learn the representations of the MVTS data sets.
arXiv Detail & Related papers (2022-02-08T04:25:29Z) - PSEUDo: Interactive Pattern Search in Multivariate Time Series with
Locality-Sensitive Hashing and Relevance Feedback [3.347485580830609]
PSEUDo is an adaptive feature learning technique for exploring visual patterns in multi-track sequential data.
Our algorithm features sub-linear training and inference time.
We demonstrate superiority of PSEUDo in terms of efficiency, accuracy, and steerability.
arXiv Detail & Related papers (2021-04-30T13:00:44Z) - Deep ConvLSTM with self-attention for human activity decoding using
wearables [0.0]
We propose a deep neural network architecture that captures features of multiple sensor time-series data but also selects important time points.
We show the validity of the proposed approach across different data sampling strategies and demonstrate that the self-attention mechanism gave a significant improvement.
The proposed methods open avenues for better decoding of human activity from multiple body sensors over extended periods time.
arXiv Detail & Related papers (2020-05-02T04:30:31Z) - 3DCFS: Fast and Robust Joint 3D Semantic-Instance Segmentation via
Coupled Feature Selection [46.922236354885]
We propose a novel 3D point clouds segmentation framework, named 3DCFS, that jointly performs semantic and instance segmentation.
Inspired by the human scene perception process, we design a novel coupled feature selection module, named CFSM, that adaptively selects and fuses the reciprocal semantic and instance features.
Our 3DCFS outperforms state-of-the-art methods on benchmark datasets in terms of accuracy, speed and computational cost.
arXiv Detail & Related papers (2020-03-01T17:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.