Human Motion Estimation with Everyday Wearables
- URL: http://arxiv.org/abs/2512.21209v1
- Date: Wed, 24 Dec 2025 14:44:51 GMT
- Title: Human Motion Estimation with Everyday Wearables
- Authors: Siqi Zhu, Yixuan Li, Junfu Li, Qi Wu, Zan Wang, Haozhe Ma, Wei Liang,
- Abstract summary: We present EveryWear, a lightweight and practical human motion capture approach based entirely on everyday wearables.<n>We introduce Ego-Elec, a 9-hour real-world dataset covering 56 daily activities across 17 diverse indoor and outdoor environments.<n>Our approach employs a multimodal teacher-student framework that integrates visual cues from egocentric cameras with inertial signals from consumer devices.
- Score: 30.10082832231011
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While on-body device-based human motion estimation is crucial for applications such as XR interaction, existing methods often suffer from poor wearability, expensive hardware, and cumbersome calibration, which hinder their adoption in daily life. To address these challenges, we present EveryWear, a lightweight and practical human motion capture approach based entirely on everyday wearables: a smartphone, smartwatch, earbuds, and smart glasses equipped with one forward-facing and two downward-facing cameras, requiring no explicit calibration before use. We introduce Ego-Elec, a 9-hour real-world dataset covering 56 daily activities across 17 diverse indoor and outdoor environments, with ground-truth 3D annotations provided by the motion capture (MoCap), to facilitate robust research and benchmarking in this direction. Our approach employs a multimodal teacher-student framework that integrates visual cues from egocentric cameras with inertial signals from consumer devices. By training directly on real-world data rather than synthetic data, our model effectively eliminates the sim-to-real gap that constrains prior work. Experiments demonstrate that our method outperforms baseline models, validating its effectiveness for practical full-body motion estimation.
Related papers
- Interpretable Multimodal Gesture Recognition for Drone and Mobile Robot Teleoperation via Log-Likelihood Ratio Fusion [14.332919759770645]
Vision-based gesture recognition has been explored as one method for hands-free teleoperation.<n>We propose a multimodal gesture recognition framework that integrates inertial data from Apple Watches on both wrists with capacitive sensing signals from custom gloves.<n>We show that our framework achieves performance comparable to a state-of-the-art vision-based baseline.
arXiv Detail & Related papers (2026-02-27T05:52:04Z) - EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents [85.77432303199176]
We propose EmbodMocap, a portable and affordable data collection pipeline using two moving iPhones.<n>Our key idea is to jointly calibrate dual RGB-D sequences to reconstruct both humans and scenes.<n>Based on the collected data, we empower three embodied AI tasks: monocular human-scene-reconstruction, where we fine-tune feedforward models that output metric-scale, world-space aligned humans and scenes; physics-based character animation, where we prove our data could be used to scale human-object interaction skills and scene-aware motion tracking; and robot motion control, where we train a humanoid robot via
arXiv Detail & Related papers (2026-02-26T16:53:41Z) - DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos [110.98100817695307]
We introduce DreamDojo, a foundation world model that learns diverse interactions and dexterous controls from 44k hours of egocentric human videos.<n>Our work enables several important applications based on generative world models, including live teleoperation, policy evaluation, and model-based planning.
arXiv Detail & Related papers (2026-02-06T18:49:43Z) - ActiveUMI: Robotic Manipulation with Active Perception from Robot-Free Human Demonstrations [32.570602111692914]
We present ActiveUMI, a framework for a data collection system that transfers in-the-wild human demonstrations to robots capable of complex bimanual manipulation.<n>ActiveUMI couples a portable VR teleoperation kit with sensorized controllers that mirror the robot's end-effectors.<n>By recording an operator's deliberate head movements via a head-mounted display, our system learns the crucial link between visual attention and manipulation.
arXiv Detail & Related papers (2025-10-02T02:44:21Z) - Recognizing Actions from Robotic View for Natural Human-Robot Interaction [52.00935005918032]
Natural Human-Robot Interaction (N-HRI) requires robots to recognize human actions at varying distances and states, regardless of whether the robot itself is in motion or stationary.<n>Existing benchmarks for N-HRI fail to address the unique complexities in N-HRI due to limited data, modalities, task categories, and diversity of subjects and environments.<n>We introduce (Action from Robotic View) a large-scale dataset for perception-centric robotic views prevalent in mobile service robots.
arXiv Detail & Related papers (2025-07-30T09:48:34Z) - LiveHPS++: Robust and Coherent Motion Capture in Dynamic Free Environment [17.832694508927407]
We introduce LiveHPS++, an innovative and effective solution based on a single LiDAR system.
Benefiting from three meticulously designed modules, our method can learn dynamic and kinematic features from human movements.
Our method has proven to significantly surpass existing state-of-the-art methods across various datasets.
arXiv Detail & Related papers (2024-07-13T10:04:45Z) - Aligning Human Motion Generation with Human Perceptions [51.831338643012444]
We propose a data-driven approach to bridge the gap by introducing a large-scale human perceptual evaluation dataset, MotionPercept, and a human motion critic model, MotionCritic.<n>Our critic model offers a more accurate metric for assessing motion quality and could be readily integrated into the motion generation pipeline.
arXiv Detail & Related papers (2024-07-02T14:01:59Z) - Daily Physical Activity Monitoring -- Adaptive Learning from Multi-source Motion Sensor Data [17.604797095380114]
In healthcare applications, there is a growing need to develop machine learning models that use data from a single source, such as from a wrist wearable device.
However, the limitation of using single-source data often compromises the model's accuracy, as it fails to capture the full scope of human activities.
We introduce a transfer learning framework that optimize machine learning models for everyday applications by leveraging multi-source data collected in a laboratory setting.
arXiv Detail & Related papers (2024-05-26T01:08:28Z) - RealDex: Towards Human-like Grasping for Robotic Dexterous Hand [64.33746404551343]
We introduce RealDex, a pioneering dataset capturing authentic dexterous hand grasping motions infused with human behavioral patterns.<n>RealDex holds immense promise in advancing humanoid robot for automated perception, cognition, and manipulation in real-world scenarios.
arXiv Detail & Related papers (2024-02-21T14:59:46Z) - Learning Human Action Recognition Representations Without Real Humans [66.61527869763819]
We present a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model.
We then evaluate the transferability of the representation learned on this data to a diverse set of downstream action recognition benchmarks.
Our approach outperforms previous baselines by up to 5%.
arXiv Detail & Related papers (2023-11-10T18:38:14Z) - Practical Imitation Learning in the Real World via Task Consistency Loss [18.827979446629296]
This paper introduces a self-supervised loss that encourages sim and real alignment both at the feature and action-prediction levels.
We achieve 80% success across ten seen and unseen scenes using only 16.2 hours of teleoperated demonstrations in sim and real.
arXiv Detail & Related papers (2022-02-03T21:43:06Z) - Human Activity Recognition models using Limited Consumer Device Sensors
and Machine Learning [0.0]
Human activity recognition has grown in popularity with its increase of applications within daily lifestyles and medical environments.
This paper presents the findings of different models that are limited to train using sensor data from smartphones and smartwatches.
Results show promise for models trained strictly using limited sensor data collected from only smartphones and smartwatches coupled with traditional machine learning concepts and algorithms.
arXiv Detail & Related papers (2022-01-21T06:54:05Z) - Learning Perceptual Locomotion on Uneven Terrains using Sparse Visual
Observations [75.60524561611008]
This work aims to exploit the use of sparse visual observations to achieve perceptual locomotion over a range of commonly seen bumps, ramps, and stairs in human-centred environments.
We first formulate the selection of minimal visual input that can represent the uneven surfaces of interest, and propose a learning framework that integrates such exteroceptive and proprioceptive data.
We validate the learned policy in tasks that require omnidirectional walking over flat ground and forward locomotion over terrains with obstacles, showing a high success rate.
arXiv Detail & Related papers (2021-09-28T20:25:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.