Related papers: Generating Virtual On-body Accelerometer Data from Virtual Textual Descriptions for Human Activity Recognition

Generating Virtual On-body Accelerometer Data from Virtual Textual Descriptions for Human Activity Recognition

URL: http://arxiv.org/abs/2305.03187v1
Date: Thu, 4 May 2023 22:14:44 GMT
Title: Generating Virtual On-body Accelerometer Data from Virtual Textual Descriptions for Human Activity Recognition
Authors: Zikang Leng, Hyeokhyen Kwon, Thomas Pl\"otz
Abstract summary: We introduce an automated pipeline that generates 3D human motion sequences via a motion model synthesis, T2M-GPT, and later converted to streams of virtual IMU data. We benchmarked our approach on three HAR datasets (RealWorld, PAMAP2, and USC-HAD) and demonstrate that the use of virtual IMU training data generated using our new approach leads to significantly improved HAR model performance.
Score: 0.6445605125467573
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The development of robust, generalized models in human activity recognition (HAR) has been hindered by the scarcity of large-scale, labeled data sets. Recent work has shown that virtual IMU data extracted from videos using computer vision techniques can lead to substantial performance improvements when training HAR models combined with small portions of real IMU data. Inspired by recent advances in motion synthesis from textual descriptions and connecting Large Language Models (LLMs) to various AI models, we introduce an automated pipeline that first uses ChatGPT to generate diverse textual descriptions of activities. These textual descriptions are then used to generate 3D human motion sequences via a motion synthesis model, T2M-GPT, and later converted to streams of virtual IMU data. We benchmarked our approach on three HAR datasets (RealWorld, PAMAP2, and USC-HAD) and demonstrate that the use of virtual IMU training data generated using our new approach leads to significantly improved HAR model performance compared to only using real IMU data. Our approach contributes to the growing field of cross-modality transfer methods and illustrate how HAR models can be improved through the generation of virtual training data that do not require any manual effort.

Related papers

UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting [57.63613048492219]
We present UAVTwin, a method for creating digital twins from real-world environments and facilitating data augmentation for training downstream models embedded in unmanned aerial vehicles (UAVs) This is achieved by integrating 3D Gaussian Splatting (3DGS) for reconstructing backgrounds along with controllable synthetic human models that display diverse appearances and actions in multiple poses.
arXiv Detail & Related papers (2025-04-02T22:17:30Z)
Pre-Trained Video Generative Models as World Simulators [59.546627730477454]
We propose Dynamic World Simulation (DWS) to transform pre-trained video generative models into controllable world simulators. To achieve precise alignment between conditioned actions and generated visual changes, we introduce a lightweight, universal action-conditioned module. Experiments demonstrate that DWS can be versatilely applied to both diffusion and autoregressive transformer models.
arXiv Detail & Related papers (2025-02-10T14:49:09Z)
DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data [61.62554324594797]
We propose DreamMask, which explores how to generate training data in the open-vocabulary setting, and how to train the model with both real and synthetic data. In general, DreamMask significantly simplifies the collection of large-scale training data, serving as a plug-and-play enhancement for existing methods. For instance, when trained on COCO and tested on ADE20K, the model equipped with DreamMask outperforms the previous state-of-the-art by a substantial margin of 2.1% mIoU.
arXiv Detail & Related papers (2025-01-03T19:00:00Z)
Distillation of Diffusion Features for Semantic Correspondence [23.54555663670558]
We propose a novel knowledge distillation technique to overcome the problem of reduced efficiency. We show how to use two large vision foundation models and distill the capabilities of these complementary models into one smaller model that maintains high accuracy at reduced computational cost. Our empirical results demonstrate that our distilled model with 3D data augmentation achieves performance superior to current state-of-the-art methods while significantly reducing computational load and enhancing practicality for real-world applications, such as semantic video correspondence.
arXiv Detail & Related papers (2024-12-04T17:55:33Z)
VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation [79.00294932026266]
VidMan is a novel framework that employs a two-stage training mechanism to enhance stability and improve data utilization efficiency. Our framework outperforms state-of-the-art baseline model GR-1 on the CALVIN benchmark, achieving a 11.7% relative improvement, and demonstrates over 9% precision gains on the OXE small-scale dataset.
arXiv Detail & Related papers (2024-11-14T03:13:26Z)
3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing [52.68314936128752]
We propose a new paradigm to automatically generate 3D labeled training data by harnessing the power of pretrained large foundation models. For each target semantic class, we first generate 2D images of a single object in various structure and appearance via diffusion models and chatGPT generated text prompts. We transform these augmented images into 3D objects and construct virtual scenes by random composition.
arXiv Detail & Related papers (2024-08-25T09:31:22Z)
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns. A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z)
Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUs [9.570759294459629]
We propose Multi$3$Net, our novel multi-modal, multitask, and contrastive-based framework approach to address the issue of limited data. Our method seeks to enhance wearable HAR performance, especially in recognizing subtle activities.
arXiv Detail & Related papers (2024-06-03T13:28:42Z)
IMUGPT 2.0: Language-Based Cross Modality Transfer for Sensor-Based Human Activity Recognition [0.19791587637442667]
Cross modality transfer approaches convert existing datasets from a source modality, such as video, to a target modality (IMU) We introduce two new extensions for IMUGPT that enhance its use for practical HAR application scenarios. We demonstrate that our diversity metrics can reduce the effort needed for the generation of virtual IMU data by at least 50%.
arXiv Detail & Related papers (2024-02-01T22:37:33Z)
Any-point Trajectory Modeling for Policy Learning [64.23861308947852]
We introduce Any-point Trajectory Modeling (ATM) to predict future trajectories of arbitrary points within a video frame. ATM outperforms strong video pre-training baselines by 80% on average. We show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology.
arXiv Detail & Related papers (2023-12-28T23:34:43Z)
Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data. We first train a scale-aware disparity network using both monocular real images and stereo virtual data. The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z)
CROMOSim: A Deep Learning-based Cross-modality Inertial Measurement Simulator [7.50015216403068]
Inertial measurement unit (IMU) data has been utilized in monitoring and assessment of human mobility. To mitigate the data scarcity problem, we design CROMOSim, a cross-modality sensor simulator. It simulates high fidelity virtual IMU sensor data from motion capture systems or monocular RGB cameras.
arXiv Detail & Related papers (2022-02-21T22:30:43Z)
TCL: Transformer-based Dynamic Graph Modelling via Contrastive Learning [87.38675639186405]
We propose a novel graph neural network approach, called TCL, which deals with the dynamically-evolving graph in a continuous-time fashion. To the best of our knowledge, this is the first attempt to apply contrastive learning to representation learning on dynamic graphs.
arXiv Detail & Related papers (2021-05-17T15:33:25Z)
IMUTube: Automatic Extraction of Virtual on-body Accelerometry from Video for Human Activity Recognition [12.91206329972949]
We introduce IMUTube, an automated processing pipeline to convert videos of human activity into virtual streams of IMU data. These virtual IMU streams represent accelerometry at a wide variety of locations on the human body. We show how the virtually-generated IMU data improves the performance of a variety of models on known HAR datasets.
arXiv Detail & Related papers (2020-05-29T21:50:38Z)
A Deep Learning Method for Complex Human Activity Recognition Using Virtual Wearable Sensors [22.923108537119685]
Sensor-based human activity recognition (HAR) is now a research hotspot in multiple application areas. We propose a novel method based on deep learning for complex HAR in the real-scene. The proposed method can surprisingly converge in a few iterations and achieve an accuracy of 91.15% on a real IMU dataset.
arXiv Detail & Related papers (2020-03-04T03:31:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.