MotioNet: 3D Human Motion Reconstruction from Monocular Video with
Skeleton Consistency
- URL: http://arxiv.org/abs/2006.12075v1
- Date: Mon, 22 Jun 2020 08:50:09 GMT
- Title: MotioNet: 3D Human Motion Reconstruction from Monocular Video with
Skeleton Consistency
- Authors: Mingyi Shi, Kfir Aberman, Andreas Aristidou, Taku Komura, Dani
Lischinski, Daniel Cohen-Or, Baoquan Chen
- Abstract summary: We introduce MotioNet, a deep neural network that directly reconstructs the motion of a 3D human skeleton from monocular video.
Our method is the first data-driven approach that directly outputs a kinematic skeleton, which is a complete, commonly used, motion representation.
- Score: 72.82534577726334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce MotioNet, a deep neural network that directly reconstructs the
motion of a 3D human skeleton from monocular video.While previous methods rely
on either rigging or inverse kinematics (IK) to associate a consistent skeleton
with temporally coherent joint rotations, our method is the first data-driven
approach that directly outputs a kinematic skeleton, which is a complete,
commonly used, motion representation. At the crux of our approach lies a deep
neural network with embedded kinematic priors, which decomposes sequences of 2D
joint positions into two separate attributes: a single, symmetric, skeleton,
encoded by bone lengths, and a sequence of 3D joint rotations associated with
global root positions and foot contact labels. These attributes are fed into an
integrated forward kinematics (FK) layer that outputs 3D positions, which are
compared to a ground truth. In addition, an adversarial loss is applied to the
velocities of the recovered rotations, to ensure that they lie on the manifold
of natural joint rotations. The key advantage of our approach is that it learns
to infer natural joint rotations directly from the training data, rather than
assuming an underlying model, or inferring them from joint positions using a
data-agnostic IK solver. We show that enforcing a single consistent skeleton
along with temporally coherent joint rotations constrains the solution space,
leading to a more robust handling of self-occlusions and depth ambiguities.
Related papers
- Learning Localization of Body and Finger Animation Skeleton Joints on Three-Dimensional Models of Human Bodies [0.0]
Our work proposes one such solution to the problem of positioning body and finger animation skeleton joints within 3D models of human bodies.
By comparing our method with the state-of-the-art, we show that it is possible to achieve significantly better results with a simpler architecture.
arXiv Detail & Related papers (2024-07-11T13:16:02Z) - SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers [57.46911575980854]
We introduce SkelFormer, a novel markerless motion capture pipeline for multi-view human pose and shape estimation.
Our method first uses off-the-shelf 2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain 3D joint positions.
Next, we design a regression-based inverse-kinematic skeletal transformer that maps the joint positions to pose and shape representations from heavily noisy observations.
arXiv Detail & Related papers (2024-04-19T04:51:18Z) - Unsupervised 3D Pose Estimation with Non-Rigid Structure-from-Motion
Modeling [83.76377808476039]
We propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior.
Inspired by the field of non-rigid structure-from-motion, we divide the task of reconstructing 3D human skeletons in motion into the estimation of a 3D reference skeleton.
A mixed spatial-temporal NRSfMformer is used to simultaneously estimate the 3D reference skeleton and the skeleton deformation of each frame from 2D observations sequence.
arXiv Detail & Related papers (2023-08-18T16:41:57Z) - A Dual-Masked Auto-Encoder for Robust Motion Capture with
Spatial-Temporal Skeletal Token Completion [13.88656793940129]
We propose an adaptive, identity-aware triangulation module to reconstruct 3D joints and identify the missing joints for each identity.
We then propose a Dual-Masked Auto-Encoder (D-MAE) which encodes the joint status with both skeletal-structural and temporal position encoding for trajectory completion.
In order to demonstrate the proposed model's capability in dealing with severe data loss scenarios, we contribute a high-accuracy and challenging motion capture dataset.
arXiv Detail & Related papers (2022-07-15T10:00:43Z) - MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose
Estimation in Video [75.23812405203778]
Recent solutions have been introduced to estimate 3D human pose from 2D keypoint sequence by considering body joints among all frames globally to learn-temporal correlation.
We propose Mix Mix, which has temporal transformer block to separately model the temporal motion of each joint and a transformer block inter-joint spatial correlation.
In addition, the network output is extended from the central frame to entire frames of input video, improving the coherence between the input and output benchmarks.
arXiv Detail & Related papers (2022-03-02T04:20:59Z) - 3D Skeleton-based Few-shot Action Recognition with JEANIE is not so
Na\"ive [28.720272938306692]
We propose a Few-shot Learning pipeline for 3D skeleton-based action recognition by Joint tEmporal and cAmera viewpoiNt alIgnmEnt.
arXiv Detail & Related papers (2021-12-23T16:09:23Z) - Attention-Driven Body Pose Encoding for Human Activity Recognition [0.0]
This article proposes a novel attention-based body pose encoding for human activity recognition.
The enriched data complements the 3D body joint position data and improves model performance.
arXiv Detail & Related papers (2020-09-29T22:17:17Z) - Skeleton-based Action Recognition via Spatial and Temporal Transformer
Networks [12.06555892772049]
We propose a novel Spatial-Temporal Transformer network (ST-TR) which models dependencies between joints using the Transformer self-attention operator.
The proposed ST-TR achieves state-of-the-art performance on all datasets when using joints' coordinates as input, and results on-par with state-of-the-art when adding bones information.
arXiv Detail & Related papers (2020-08-17T15:25:40Z) - Skeleton-Aware Networks for Deep Motion Retargeting [83.65593033474384]
We introduce a novel deep learning framework for data-driven motion between skeletons.
Our approach learns how to retarget without requiring any explicit pairing between the motions in the training set.
arXiv Detail & Related papers (2020-05-12T12:51:40Z) - Anatomy-aware 3D Human Pose Estimation with Bone-based Pose
Decomposition [92.99291528676021]
Instead of directly regressing the 3D joint locations, we decompose the task into bone direction prediction and bone length prediction.
Our motivation is the fact that the bone lengths of a human skeleton remain consistent across time.
Our full model outperforms the previous best results on Human3.6M and MPI-INF-3DHP datasets.
arXiv Detail & Related papers (2020-02-24T15:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.