Scaling Human Activity Recognition: A Comparative Evaluation of Synthetic Data Generation and Augmentation Techniques
- URL: http://arxiv.org/abs/2506.07612v2
- Date: Fri, 13 Jun 2025 13:43:21 GMT
- Title: Scaling Human Activity Recognition: A Comparative Evaluation of Synthetic Data Generation and Augmentation Techniques
- Authors: Zikang Leng, Archith Iyer, Thomas Plötz,
- Abstract summary: Human activity recognition (HAR) is often limited by the scarcity of labeled datasets.<n>Recent work has explored generating virtual inertial measurement unit (IMU) data via cross-modality transfer.
- Score: 1.0712226955584796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human activity recognition (HAR) is often limited by the scarcity of labeled datasets due to the high cost and complexity of real-world data collection. To mitigate this, recent work has explored generating virtual inertial measurement unit (IMU) data via cross-modality transfer. While video-based and language-based pipelines have each shown promise, they differ in assumptions and computational cost. Moreover, their effectiveness relative to traditional sensor-level data augmentation remains unclear. In this paper, we present a direct comparison between these two virtual IMU generation approaches against classical data augmentation techniques. We construct a large-scale virtual IMU dataset spanning 100 diverse activities from Kinetics-400 and simulate sensor signals at 22 body locations. The three data generation strategies are evaluated on benchmark HAR datasets (UTD-MHAD, PAMAP2, HAD-AW) using four popular models. Results show that virtual IMU data significantly improves performance over real or augmented data alone, particularly under limited-data conditions. We offer practical guidance on choosing data generation strategies and highlight the distinct advantages and disadvantages of each approach.
Related papers
- AugmentGest: Can Random Data Cropping Augmentation Boost Gesture Recognition Performance? [49.64902130083662]
This paper proposes a comprehensive data augmentation framework that integrates geometric transformations, random variations, rotation, zooming and intensity-based transformations.<n>The proposed augmentation strategy is evaluated on three models: multi-stream e2eET, FPPR point cloud-based hand gesture recognition (HGR), and DD-Network.
arXiv Detail & Related papers (2025-06-08T16:43:05Z) - Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled Data [35.431340001608476]
Traditional data mining methods are inadequate when faced with large-scale, high-dimensional and complex data.<n>This study introduces semi-supervised learning methods, aiming to improve the algorithm's ability to utilize unlabeled data.<n> Specifically, we adopt a self-training method and combine it with a convolutional neural network (CNN) for image feature extraction and classification.
arXiv Detail & Related papers (2024-11-27T18:59:50Z) - Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - IMUGPT 2.0: Language-Based Cross Modality Transfer for Sensor-Based
Human Activity Recognition [0.19791587637442667]
Cross modality transfer approaches convert existing datasets from a source modality, such as video, to a target modality (IMU)
We introduce two new extensions for IMUGPT that enhance its use for practical HAR application scenarios.
We demonstrate that our diversity metrics can reduce the effort needed for the generation of virtual IMU data by at least 50%.
arXiv Detail & Related papers (2024-02-01T22:37:33Z) - Scaling Data Generation in Vision-and-Language Navigation [116.95534559103788]
We propose an effective paradigm for generating large-scale data for learning.
We apply 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs.
Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning.
arXiv Detail & Related papers (2023-07-28T16:03:28Z) - Generating Virtual On-body Accelerometer Data from Virtual Textual
Descriptions for Human Activity Recognition [0.6445605125467573]
We introduce an automated pipeline that generates 3D human motion sequences via a motion model synthesis, T2M-GPT, and later converted to streams of virtual IMU data.
We benchmarked our approach on three HAR datasets (RealWorld, PAMAP2, and USC-HAD) and demonstrate that the use of virtual IMU training data generated using our new approach leads to significantly improved HAR model performance.
arXiv Detail & Related papers (2023-05-04T22:14:44Z) - One-Shot Domain Adaptive and Generalizable Semantic Segmentation with
Class-Aware Cross-Domain Transformers [96.51828911883456]
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data.
Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation.
We explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization problem, where only one real-world data sample is available.
arXiv Detail & Related papers (2022-12-14T15:54:15Z) - Transformer Networks for Data Augmentation of Human Physical Activity
Recognition [61.303828551910634]
State of the art models like Recurrent Generative Adrial Networks (RGAN) are used to generate realistic synthetic data.
In this paper, transformer based generative adversarial networks which have global attention on data, are compared on PAMAP2 and Real World Human Activity Recognition data sets with RGAN.
arXiv Detail & Related papers (2021-09-02T16:47:29Z) - IMUTube: Automatic Extraction of Virtual on-body Accelerometry from
Video for Human Activity Recognition [12.91206329972949]
We introduce IMUTube, an automated processing pipeline to convert videos of human activity into virtual streams of IMU data.
These virtual IMU streams represent accelerometry at a wide variety of locations on the human body.
We show how the virtually-generated IMU data improves the performance of a variety of models on known HAR datasets.
arXiv Detail & Related papers (2020-05-29T21:50:38Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - A Deep Learning Method for Complex Human Activity Recognition Using
Virtual Wearable Sensors [22.923108537119685]
Sensor-based human activity recognition (HAR) is now a research hotspot in multiple application areas.
We propose a novel method based on deep learning for complex HAR in the real-scene.
The proposed method can surprisingly converge in a few iterations and achieve an accuracy of 91.15% on a real IMU dataset.
arXiv Detail & Related papers (2020-03-04T03:31:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.