We are More than Our Joints: Predicting how 3D Bodies Move
- URL: http://arxiv.org/abs/2012.00619v2
- Date: Fri, 2 Apr 2021 13:04:34 GMT
- Title: We are More than Our Joints: Predicting how 3D Bodies Move
- Authors: Yan Zhang and Michael J. Black and Siyu Tang
- Abstract summary: We train a novel variational autoencoder that generates motions from latent frequencies.
Experiments show that our method produces state-of-the-art results and realistic 3D body animations.
- Score: 63.34072043909123
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A key step towards understanding human behavior is the prediction of 3D human
motion. Successful solutions have many applications in human tracking, HCI, and
graphics. Most previous work focuses on predicting a time series of future 3D
joint locations given a sequence 3D joints from the past. This Euclidean
formulation generally works better than predicting pose in terms of joint
rotations. Body joint locations, however, do not fully constrain 3D human pose,
leaving degrees of freedom undefined, making it hard to animate a realistic
human from only the joints. Note that the 3D joints can be viewed as a sparse
point cloud. Thus the problem of human motion prediction can be seen as point
cloud prediction. With this observation, we instead predict a sparse set of
locations on the body surface that correspond to motion capture markers. Given
such markers, we fit a parametric body model to recover the 3D shape and pose
of the person. These sparse surface markers also carry detailed information
about human movement that is not present in the joints, increasing the
naturalness of the predicted motions. Using the AMASS dataset, we train MOJO,
which is a novel variational autoencoder that generates motions from latent
frequencies. MOJO preserves the full temporal resolution of the input motion,
and sampling from the latent frequencies explicitly introduces high-frequency
components into the generated motion. We note that motion prediction methods
accumulate errors over time, resulting in joints or markers that diverge from
true human bodies. To address this, we fit SMPL-X to the predictions at each
time step, projecting the solution back onto the space of valid bodies. These
valid markers are then propagated in time. Experiments show that our method
produces state-of-the-art results and realistic 3D body animations. The code
for research purposes is at https://yz-cnsdqz.github.io/MOJO/MOJO.html
Related papers
- Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior.
Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z) - WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion [43.95997922499137]
WHAM (World-grounded Humans with Accurate Motion) reconstructs 3D human motion in a global coordinate system from video.
Uses camera angular velocity estimated from a SLAM method together with human motion to estimate the body's global trajectory.
outperforms all existing 3D human motion recovery methods across multiple in-the-wild benchmarks.
arXiv Detail & Related papers (2023-12-12T18:57:46Z) - FutureHuman3D: Forecasting Complex Long-Term 3D Human Behavior from Video Observations [26.693664045454526]
We present a generative approach to forecast long-term future human behavior in 3D, requiring only weak supervision from readily available 2D human action data.
We jointly predict high-level coarse action labels together with their low-level fine-grained realizations as characteristic 3D human poses.
Our experiments demonstrate the complementary nature of joint action and 3D pose prediction.
arXiv Detail & Related papers (2022-11-25T18:59:53Z) - DMMGAN: Diverse Multi Motion Prediction of 3D Human Joints using
Attention-Based Generative Adverserial Network [9.247294820004143]
We propose a transformer-based generative model for forecasting multiple diverse human motions.
Our model first predicts the pose of the body relative to the hip joint. Then the textitHip Prediction Module predicts the trajectory of the hip movement for each predicted pose frame.
We show that our system outperforms the state-of-the-art in human motion prediction while it can predict diverse multi-motion future trajectories with hip movements.
arXiv Detail & Related papers (2022-09-13T23:22:33Z) - Live Stream Temporally Embedded 3D Human Body Pose and Shape Estimation [13.40702053084305]
We present a temporally embedded 3D human body pose and shape estimation (TePose) method to improve the accuracy and temporal consistency pose in live stream videos.
A multi-scale convolutional network is presented as the motion discriminator for adversarial training using datasets without any 3D labeling.
arXiv Detail & Related papers (2022-07-25T21:21:59Z) - 3D Skeleton-based Human Motion Prediction with Manifold-Aware GAN [3.1313293632309827]
We propose a novel solution for 3D skeleton-based human motion prediction.
We build a manifold-aware Wasserstein generative adversarial model that captures the temporal and spatial dependencies of human motion.
Experiments have been conducted on CMU MoCap and Human 3.6M datasets.
arXiv Detail & Related papers (2022-03-01T20:49:13Z) - Learning Motion Priors for 4D Human Body Capture in 3D Scenes [81.54377747405812]
We propose LEMO: LEarning human MOtion priors for 4D human body capture.
We introduce a novel motion prior, which reduces the jitters exhibited by poses recovered over a sequence.
We also design a contact friction term and a contact-aware motion infiller obtained via per-instance self-supervised training.
With our pipeline, we demonstrate high-quality 4D human body capture, reconstructing smooth motions and physically plausible body-scene interactions.
arXiv Detail & Related papers (2021-08-23T20:47:09Z) - Perpetual Motion: Generating Unbounded Human Motion [61.40259979876424]
We focus on long-term prediction; that is, generating long sequences of human motion that is plausible.
We propose a model to generate non-deterministic, textitever-changing, perpetual human motion.
We train this using a heavy-tailed function of the KL divergence of a white-noise Gaussian process, allowing latent sequence temporal dependency.
arXiv Detail & Related papers (2020-07-27T21:50:36Z) - Contact and Human Dynamics from Monocular Video [73.47466545178396]
Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors.
We present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input.
arXiv Detail & Related papers (2020-07-22T21:09:11Z) - Anatomy-aware 3D Human Pose Estimation with Bone-based Pose
Decomposition [92.99291528676021]
Instead of directly regressing the 3D joint locations, we decompose the task into bone direction prediction and bone length prediction.
Our motivation is the fact that the bone lengths of a human skeleton remain consistent across time.
Our full model outperforms the previous best results on Human3.6M and MPI-INF-3DHP datasets.
arXiv Detail & Related papers (2020-02-24T15:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.