Human Motion Capture from Loose and Sparse Inertial Sensors with Garment-aware Diffusion Models
- URL: http://arxiv.org/abs/2506.15290v1
- Date: Wed, 18 Jun 2025 09:16:36 GMT
- Title: Human Motion Capture from Loose and Sparse Inertial Sensors with Garment-aware Diffusion Models
- Authors: Andela Ilic, Jiaxi Jiang, Paul Streli, Xintong Liu, Christian Holz,
- Abstract summary: We present a new task of full-body human pose estimation using sparse, loosely attached IMU sensors.<n>We developed transformer-based diffusion models to synthesize loose IMU data and estimate human poses based on this challenging loose IMU data.
- Score: 25.20942802233326
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motion capture using sparse inertial sensors has shown great promise due to its portability and lack of occlusion issues compared to camera-based tracking. Existing approaches typically assume that IMU sensors are tightly attached to the human body. However, this assumption often does not hold in real-world scenarios. In this paper, we present a new task of full-body human pose estimation using sparse, loosely attached IMU sensors. To solve this task, we simulate IMU recordings from an existing garment-aware human motion dataset. We developed transformer-based diffusion models to synthesize loose IMU data and estimate human poses based on this challenging loose IMU data. In addition, we show that incorporating garment-related parameters while training the model on simulated loose data effectively maintains expressiveness and enhances the ability to capture variations introduced by looser or tighter garments. Experiments show that our proposed diffusion methods trained on simulated and synthetic data outperformed the state-of-the-art methods quantitatively and qualitatively, opening up a promising direction for future research.
Related papers
- Learning to Track Any Points from Human Motion [55.831218129679144]
We propose an automated pipeline to generate pseudo-labeled training data for point tracking.<n>A point tracking model trained on AnthroTAP achieves annotated state-of-the-art performance on the TAP-Vid benchmark.
arXiv Detail & Related papers (2025-07-08T17:59:58Z) - UMotion: Uncertainty-driven Human Motion Estimation from Inertial and Ultra-wideband Units [11.911147790899816]
UMotion is an uncertainty-driven, online fusing-all state estimation framework for 3D human shape and pose estimation.<n>It is supported by six integrated, body-worn ultra-wideband (UWB) distance sensors with IMUs.
arXiv Detail & Related papers (2025-05-14T13:48:36Z) - MSSIDD: A Benchmark for Multi-Sensor Denoising [55.41612200877861]
We introduce a new benchmark, the Multi-Sensor SIDD dataset, which is the first raw-domain dataset designed to evaluate the sensor transferability of denoising models.
We propose a sensor consistency training framework that enables denoising models to learn the sensor-invariant features.
arXiv Detail & Related papers (2024-11-18T13:32:59Z) - PoseAugment: Generative Human Pose Data Augmentation with Physical Plausibility for IMU-based Motion Capture [40.765438433729344]
We propose PoseAugment, a novel pipeline incorporating VAE-based pose generation and physical optimization.
Given a pose sequence, the VAE module generates infinite poses with both high fidelity and diversity, while keeping the data distribution.
High-quality IMU data are then synthesized from the augmented poses for training motion capture models.
arXiv Detail & Related papers (2024-09-21T10:51:16Z) - C3T: Cross-modal Transfer Through Time for Sensor-based Human Activity Recognition [7.139150172150715]
We introduce Cross-modal Transfer Through Time (C3T)<n>C3T preserves temporal information during alignment to handle dynamic sensor data better.<n>Our experiments on various camera+IMU datasets demonstrate that C3T outperforms existing methods in UMA by at least 8% in accuracy.
arXiv Detail & Related papers (2024-07-23T19:06:44Z) - Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition [24.217068565936117]
We present a novel method for action recognition that integrates motion data from body-worn IMUs with egocentric video.
To model the complex relation of multiple IMU devices placed across the body, we exploit the collaborative dynamics in multiple IMU devices.
Experiments show our method can achieve state-of-the-art performance on multiple public datasets.
arXiv Detail & Related papers (2024-07-09T07:53:16Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for
Enhanced Human Pose Estimation with Sparse Inertial Sensors [17.3834029178939]
This paper introduces a novel human pose estimation approach using sparse inertial sensors.
It leverages a diverse array of real inertial motion capture data from different skeleton formats to improve motion diversity and model generalization.
The approach demonstrates superior performance over state-of-the-art models across five public datasets, notably reducing pose error by 19% on the DIP-IMU dataset.
arXiv Detail & Related papers (2023-12-02T13:17:10Z) - Layout Sequence Prediction From Noisy Mobile Modality [53.49649231056857]
Trajectory prediction plays a vital role in understanding pedestrian movement for applications such as autonomous driving and robotics.
Current trajectory prediction models depend on long, complete, and accurately observed sequences from visual modalities.
We propose LTrajDiff, a novel approach that treats objects obstructed or out of sight as equally important as those with fully visible trajectories.
arXiv Detail & Related papers (2023-10-09T20:32:49Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular
Depth Estimation by Integrating IMU Motion Dynamics [74.1720528573331]
Unsupervised monocular depth and ego-motion estimation has drawn extensive research attention in recent years.
We propose DynaDepth, a novel scale-aware framework that integrates information from vision and IMU motion dynamics.
We validate the effectiveness of DynaDepth by conducting extensive experiments and simulations on the KITTI and Make3D datasets.
arXiv Detail & Related papers (2022-07-11T07:50:22Z) - Transformer Inertial Poser: Attention-based Real-time Human Motion
Reconstruction from Sparse IMUs [79.72586714047199]
We propose an attention-based deep learning method to reconstruct full-body motion from six IMU sensors in real-time.
Our method achieves new state-of-the-art results both quantitatively and qualitatively, while being simple to implement and smaller in size.
arXiv Detail & Related papers (2022-03-29T16:24:52Z) - CROMOSim: A Deep Learning-based Cross-modality Inertial Measurement
Simulator [7.50015216403068]
Inertial measurement unit (IMU) data has been utilized in monitoring and assessment of human mobility.
To mitigate the data scarcity problem, we design CROMOSim, a cross-modality sensor simulator.
It simulates high fidelity virtual IMU sensor data from motion capture systems or monocular RGB cameras.
arXiv Detail & Related papers (2022-02-21T22:30:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.