Physics Informed Human Posture Estimation Based on 3D Landmarks from Monocular RGB-Videos
- URL: http://arxiv.org/abs/2512.06783v1
- Date: Sun, 07 Dec 2025 10:54:09 GMT
- Title: Physics Informed Human Posture Estimation Based on 3D Landmarks from Monocular RGB-Videos
- Authors: Tobias Leuthold, Michele Xiloyannis, Yves Zimmermann,
- Abstract summary: State-of-the-art models like BlazePose excel in real-time pose tracking, but their lack of anatomical constraints indicates improvement potential by including physical knowledge.<n>We present a real-time post-processing algorithm fusing the strengths of BlazePose 3D and 2D estimations using a weighted optimization.<n> Evaluation using the Physio2.2M dataset shows a 10.2 percent reduction in 3D MPJPE and a 16.6 percent decrease in errors of angles between body segments compared to BlazePose 3D estimation.
- Score: 1.3399128310861848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Applications providing automated coaching for physical training are increasing in popularity, for example physical therapy. These applications rely on accurate and robust pose estimation using monocular video streams. State-of-the-art models like BlazePose excel in real-time pose tracking, but their lack of anatomical constraints indicates improvement potential by including physical knowledge. We present a real-time post-processing algorithm fusing the strengths of BlazePose 3D and 2D estimations using a weighted optimization, penalizing deviations from expected bone length and biomechanical models. Bone length estimations are refined to the individual anatomy using a Kalman filter with adapting measurement trust. Evaluation using the Physio2.2M dataset shows a 10.2 percent reduction in 3D MPJPE and a 16.6 percent decrease in errors of angles between body segments compared to BlazePose 3D estimation. Our method provides a robust, anatomically consistent pose estimation based on a computationally efficient video-to-3D pose estimation, suitable for automated physiotherapy, healthcare, and sports coaching on consumer-level laptops and mobile devices. The refinement runs on the backend with anonymized data only.
Related papers
- Reconstructing Humans with a Biomechanically Accurate Skeleton [55.06027148976482]
We introduce a method for reconstructing 3D humans from a single image using a biomechanically accurate skeleton model.<n>Compared to state-of-the-art methods for 3D human mesh recovery, our model achieves competitive performance on standard benchmarks.
arXiv Detail & Related papers (2025-03-27T17:56:24Z) - AthletePose3D: A Benchmark Dataset for 3D Human Pose Estimation and Kinematic Validation in Athletic Movements [4.653030985708889]
AthletePose3D is a novel dataset designed to capture high-speed, high-acceleration athletic movements.<n>We evaluate state-of-the-art (SOTA) monocular 2D and 3D pose estimation models on the dataset.
arXiv Detail & Related papers (2025-03-10T16:16:02Z) - BioPose: Biomechanically-accurate 3D Pose Estimation from Monocular Videos [6.280386490530478]
BioPose is a learning-based framework for predicting biomechanically accurate 3D human pose directly from monocular videos.<n>It includes a Multi-Query Human Mesh Recovery model (MQ-HMR), a Neural Inverse Kinematics (NeurIK) model, and a 2D-informed pose refinement technique.<n> Experiments on benchmark datasets demonstrate that BioPose significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-01-14T02:56:19Z) - Estimating Body and Hand Motion in an Ego-sensed World [62.61989004520802]
We present EgoAllo, a system for human motion estimation from a head-mounted device.<n>Using only egocentric SLAM poses and images, EgoAllo guides sampling from a conditional diffusion model to estimate 3D body pose, height, and hand parameters.
arXiv Detail & Related papers (2024-10-04T17:59:57Z) - Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs [15.017274891943162]
Temporal 3D human pose estimation from monocular videos is a challenging task in human-centered computer vision.
Inertial sensor has been introduced to provide complementary source of information.
It remains challenging to integrate heterogeneous sensor data for producing physically rational 3D human poses.
arXiv Detail & Related papers (2024-04-27T09:02:42Z) - 3D Kinematics Estimation from Video with a Biomechanical Model and
Synthetic Training Data [4.130944152992895]
We propose a novel biomechanics-aware network that directly outputs 3D kinematics from two input views.
Our experiments demonstrate that the proposed approach, only trained on synthetic data, outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2024-02-20T17:33:40Z) - Cloth2Body: Generating 3D Human Body Mesh from 2D Clothing [54.29207348918216]
Cloth2Body needs to address new and emerging challenges raised by the partial observation of the input and the high diversity of the output.
We propose an end-to-end framework that can accurately estimate 3D body mesh parameterized by pose and shape from a 2D clothing image.
As shown by experimental results, the proposed framework achieves state-of-the-art performance and can effectively recover natural and diverse 3D body meshes from 2D images.
arXiv Detail & Related papers (2023-09-28T06:18:38Z) - Towards Single Camera Human 3D-Kinematics [15.559206592078425]
We propose a novel approach for direct 3D human kinematic estimation D3KE from videos using deep neural networks.
Our experiments demonstrate that the proposed end-to-end training is robust and outperforms 2D and 3D markerless motion capture based kinematic estimation pipelines.
arXiv Detail & Related papers (2023-01-13T08:44:09Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Contact and Human Dynamics from Monocular Video [73.47466545178396]
Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors.
We present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input.
arXiv Detail & Related papers (2020-07-22T21:09:11Z) - Anatomy-aware 3D Human Pose Estimation with Bone-based Pose
Decomposition [92.99291528676021]
Instead of directly regressing the 3D joint locations, we decompose the task into bone direction prediction and bone length prediction.
Our motivation is the fact that the bone lengths of a human skeleton remain consistent across time.
Our full model outperforms the previous best results on Human3.6M and MPI-INF-3DHP datasets.
arXiv Detail & Related papers (2020-02-24T15:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.