HAR-DoReMi: Optimizing Data Mixture for Self-Supervised Human Activity Recognition Across Heterogeneous IMU Datasets
- URL: http://arxiv.org/abs/2503.13542v1
- Date: Sun, 16 Mar 2025 04:31:58 GMT
- Title: HAR-DoReMi: Optimizing Data Mixture for Self-Supervised Human Activity Recognition Across Heterogeneous IMU Datasets
- Authors: Lulu Ban, Tao Zhu, Xiangqing Lu, Qi Qiu, Wenyong Han, Shuangjian Li, Liming Chen, Kevin I-Kai Wang, Mingxing Nie, Yaping Wan,
- Abstract summary: Cross-dataset Human Activity Recognition (HAR) suffers from limited model generalization, hindering its practical deployment.<n>We introduce a data mixture optimization strategy for pre-training HAR models, aiming to improve the recognition performance across heterogeneous datasets.<n>Har-DoReMi improves the accuracy by an average of 6.51%, compared to the current state-of-the-art method with only approximately 30% to 50% of the data usage.
- Score: 4.32515027626613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-dataset Human Activity Recognition (HAR) suffers from limited model generalization, hindering its practical deployment. To address this critical challenge, inspired by the success of DoReMi in Large Language Models (LLMs), we introduce a data mixture optimization strategy for pre-training HAR models, aiming to improve the recognition performance across heterogeneous datasets. However, directly applying DoReMi to the HAR field encounters new challenges due to the continuous, multi-channel and intrinsic heterogeneous characteristics of IMU sensor data. To overcome these limitations, we propose a novel framework HAR-DoReMi, which introduces a masked reconstruction task based on Mean Squared Error (MSE) loss. By raplacing the discrete language sequence prediction task, which relies on the Negative Log-Likelihood (NLL) loss, in the original DoReMi framework, the proposed framework is inherently more appropriate for handling the continuous and multi-channel characteristics of IMU data. In addition, HAR-DoReMi integrates the Mahony fusion algorithm into the self-supervised HAR pre-training, aiming to mitigate the heterogeneity of varying sensor orientation. This is achieved by estimating the sensor orientation within each dataset and facilitating alignment with a unified coordinate system, thereby improving the cross-dataset generalization ability of the HAR model. Experimental evaluation on multiple cross-dataset HAR transfer tasks demonstrates that HAR-DoReMi improves the accuracy by an average of 6.51%, compared to the current state-of-the-art method with only approximately 30% to 50% of the data usage. These results confirm the effectiveness of HAR-DoReMi in improving the generalization and data efficiency of pre-training HAR models, underscoring its significant potential to facilitate the practical deployment of HAR technology.
Related papers
- A Scalable Approach to Covariate and Concept Drift Management via Adaptive Data Segmentation [0.562479170374811]
In many real-world applications, continuous machine learning (ML) systems are crucial but prone to data drift.
Traditional drift adaptation methods typically update models using ensemble techniques, often discarding drifted historical data.
We contend that explicitly incorporating drifted data into the model training process significantly enhances model accuracy and robustness.
arXiv Detail & Related papers (2024-11-23T17:35:23Z) - Sensor Data Augmentation from Skeleton Pose Sequences for Improving Human Activity Recognition [5.669438716143601]
Human Activity Recognition (HAR) has not fully capitalized on the proliferation of deep learning.
We propose a novel approach to improve wearable sensor-based HAR by introducing a pose-to-sensor network model.
Our contributions include the integration of simultaneous training, direct pose-to-sensor generation, and a comprehensive evaluation on the MM-Fit dataset.
arXiv Detail & Related papers (2024-04-25T10:13:18Z) - Contrastive Multiple Instance Learning for Weakly Supervised Person ReID [50.04900262181093]
We introduce Contrastive Multiple Instance Learning (CMIL), a novel framework tailored for more effective weakly supervised ReID.
CMIL distinguishes itself by requiring only a single model and no pseudo labels while leveraging contrastive losses.
We release the WL-MUDD dataset, an extension of the MUDD dataset featuring naturally occurring weak labels from the real-world application at PerformancePhoto.co.
arXiv Detail & Related papers (2024-02-12T14:48:31Z) - IMUGPT 2.0: Language-Based Cross Modality Transfer for Sensor-Based
Human Activity Recognition [0.19791587637442667]
Cross modality transfer approaches convert existing datasets from a source modality, such as video, to a target modality (IMU)
We introduce two new extensions for IMUGPT that enhance its use for practical HAR application scenarios.
We demonstrate that our diversity metrics can reduce the effort needed for the generation of virtual IMU data by at least 50%.
arXiv Detail & Related papers (2024-02-01T22:37:33Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - A Conditioned Unsupervised Regression Framework Attuned to the Dynamic Nature of Data Streams [0.0]
This paper presents an optimal strategy for streaming contexts with limited labeled data, introducing an adaptive technique for unsupervised regression.
The proposed method leverages a sparse set of initial labels and introduces an innovative drift detection mechanism.
To enhance adaptability, we integrate the ADWIN (ADaptive WINdowing) algorithm with error generalization based on Root Mean Square Error (RMSE)
arXiv Detail & Related papers (2023-12-12T19:23:54Z) - Filling the Missing: Exploring Generative AI for Enhanced Federated
Learning over Heterogeneous Mobile Edge Devices [72.61177465035031]
We propose a generative AI-empowered federated learning to address these challenges by leveraging the idea of FIlling the MIssing (FIMI) portion of local data.
Experiment results demonstrate that FIMI can save up to 50% of the device-side energy to achieve the target global test accuracy.
arXiv Detail & Related papers (2023-10-21T12:07:04Z) - PREM: A Simple Yet Effective Approach for Node-Level Graph Anomaly
Detection [65.24854366973794]
Node-level graph anomaly detection (GAD) plays a critical role in identifying anomalous nodes from graph-structured data in domains such as medicine, social networks, and e-commerce.
We introduce a simple method termed PREprocessing and Matching (PREM for short) to improve the efficiency of GAD.
Our approach streamlines GAD, reducing time and memory consumption while maintaining powerful anomaly detection capabilities.
arXiv Detail & Related papers (2023-10-18T02:59:57Z) - Informative Data Mining for One-Shot Cross-Domain Semantic Segmentation [84.82153655786183]
We propose a novel framework called Informative Data Mining (IDM) to enable efficient one-shot domain adaptation for semantic segmentation.
IDM provides an uncertainty-based selection criterion to identify the most informative samples, which facilitates quick adaptation and reduces redundant training.
Our approach outperforms existing methods and achieves a new state-of-the-art one-shot performance of 56.7%/55.4% on the GTA5/SYNTHIA to Cityscapes adaptation tasks.
arXiv Detail & Related papers (2023-09-25T15:56:01Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - A Deep Learning Method for Complex Human Activity Recognition Using
Virtual Wearable Sensors [22.923108537119685]
Sensor-based human activity recognition (HAR) is now a research hotspot in multiple application areas.
We propose a novel method based on deep learning for complex HAR in the real-scene.
The proposed method can surprisingly converge in a few iterations and achieve an accuracy of 91.15% on a real IMU dataset.
arXiv Detail & Related papers (2020-03-04T03:31:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.