Motion Projection Consistency Based 3D Human Pose Estimation with
Virtual Bones from Monocular Videos
- URL: http://arxiv.org/abs/2106.14706v1
- Date: Mon, 28 Jun 2021 13:37:57 GMT
- Title: Motion Projection Consistency Based 3D Human Pose Estimation with
Virtual Bones from Monocular Videos
- Authors: Guangming Wang, Honghao Zeng, Ziliang Wang, Zhe Liu, Hesheng Wang
- Abstract summary: The concept of virtual bones is proposed to solve the problem of cumulative error in 3D human pose estimation.
The proposed network in this paper predicts real bones and virtual bones, simultaneously.
The consistency between the 2D projected position displacement predicted by the network and the captured real 2D displacement by the camera is proposed as a new projection consistency loss for the learning of 3D human pose.
- Score: 16.808244226857745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time 3D human pose estimation is crucial for human-computer interaction.
It is cheap and practical to estimate 3D human pose only from monocular video.
However, recent bone splicing based 3D human pose estimation method brings
about the problem of cumulative error. In this paper, the concept of virtual
bones is proposed to solve such a challenge. The virtual bones are imaginary
bones between non-adjacent joints. They do not exist in reality, but they bring
new loop constraints for the estimation of 3D human joints. The proposed
network in this paper predicts real bones and virtual bones, simultaneously.
The final length of real bones is constrained and learned by the loop
constructed by the predicted real bones and virtual bones. Besides, the motion
constraints of joints in consecutive frames are considered. The consistency
between the 2D projected position displacement predicted by the network and the
captured real 2D displacement by the camera is proposed as a new projection
consistency loss for the learning of 3D human pose. The experiments on the
Human3.6M dataset demonstrate the good performance of the proposed method.
Ablation studies demonstrate the effectiveness of the proposed inter-frame
projection consistency constraints and intra-frame loop constraints.
Related papers
- ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos [18.685856290041283]
ARTS surpasses existing state-of-the-art video-based methods in both per-frame accuracy and temporal consistency on popular benchmarks.
A skeleton estimation and disentanglement module is proposed to estimate the 3D skeletons from a video.
The regressor consists of three modules: Temporal Inverse Kinematics (TIK), Bone-guided Shape Fitting (BSF), and Motion-Centric Refinement (MCR)
arXiv Detail & Related papers (2024-10-21T02:06:43Z) - Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs [15.017274891943162]
Temporal 3D human pose estimation from monocular videos is a challenging task in human-centered computer vision.
Inertial sensor has been introduced to provide complementary source of information.
It remains challenging to integrate heterogeneous sensor data for producing physically rational 3D human poses.
arXiv Detail & Related papers (2024-04-27T09:02:42Z) - Unsupervised 3D Pose Estimation with Non-Rigid Structure-from-Motion
Modeling [83.76377808476039]
We propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior.
Inspired by the field of non-rigid structure-from-motion, we divide the task of reconstructing 3D human skeletons in motion into the estimation of a 3D reference skeleton.
A mixed spatial-temporal NRSfMformer is used to simultaneously estimate the 3D reference skeleton and the skeleton deformation of each frame from 2D observations sequence.
arXiv Detail & Related papers (2023-08-18T16:41:57Z) - LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human
Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body.
It is fully differentiable and optimizable with disentangled shape and pose latent spaces.
Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z) - 3D Human Pose Regression using Graph Convolutional Network [68.8204255655161]
We propose a graph convolutional network named PoseGraphNet for 3D human pose regression from 2D poses.
Our model's performance is close to the state-of-the-art, but with much fewer parameters.
arXiv Detail & Related papers (2021-05-21T14:41:31Z) - Graph and Temporal Convolutional Networks for 3D Multi-person Pose
Estimation in Monocular Videos [33.974241749058585]
We propose a novel framework integrating graph convolutional networks (GCNs) and temporal convolutional networks (TCNs) to robustly estimate camera-centric multi-person 3D poses.
In particular, we introduce a human-joint GCN, which employs the 2D pose estimator's confidence scores to improve the pose estimation results.
The two GCNs work together to estimate the spatial frame-wise 3D poses and can make use of both visible joint and bone information in the target frame to estimate the occluded or missing human-part information.
arXiv Detail & Related papers (2020-12-22T03:01:19Z) - We are More than Our Joints: Predicting how 3D Bodies Move [63.34072043909123]
We train a novel variational autoencoder that generates motions from latent frequencies.
Experiments show that our method produces state-of-the-art results and realistic 3D body animations.
arXiv Detail & Related papers (2020-12-01T16:41:04Z) - Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh
Recovery from a 2D Human Pose [70.23652933572647]
We propose a novel graph convolutional neural network (GraphCNN)-based system that estimates the 3D coordinates of human mesh vertices directly from the 2D human pose.
We show that our Pose2Mesh outperforms the previous 3D human pose and mesh estimation methods on various benchmark datasets.
arXiv Detail & Related papers (2020-08-20T16:01:56Z) - Anatomy-aware 3D Human Pose Estimation with Bone-based Pose
Decomposition [92.99291528676021]
Instead of directly regressing the 3D joint locations, we decompose the task into bone direction prediction and bone length prediction.
Our motivation is the fact that the bone lengths of a human skeleton remain consistent across time.
Our full model outperforms the previous best results on Human3.6M and MPI-INF-3DHP datasets.
arXiv Detail & Related papers (2020-02-24T15:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.