Imposing Temporal Consistency on Deep Monocular Body Shape and Pose
Estimation
- URL: http://arxiv.org/abs/2202.03074v2
- Date: Tue, 8 Feb 2022 16:58:13 GMT
- Title: Imposing Temporal Consistency on Deep Monocular Body Shape and Pose
Estimation
- Authors: Alexandra Zimmer, Anna Hilsmann, Wieland Morgenstern, Peter Eisert
- Abstract summary: This paper presents an elegant solution for the integration of temporal constraints in the fitting process.
We derive parameters of a sequence of body models, representing shape and motion of a person, including jaw poses, facial expressions, and finger poses.
Our approach enables the derivation of realistic 3D body models from image sequences, including facial expression and articulated hands.
- Score: 67.23327074124855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate and temporally consistent modeling of human bodies is essential for
a wide range of applications, including character animation, understanding
human social behavior and AR/VR interfaces. Capturing human motion accurately
from a monocular image sequence is still challenging and the modeling quality
is strongly influenced by the temporal consistency of the captured body motion.
Our work presents an elegant solution for the integration of temporal
constraints in the fitting process. This does not only increase temporal
consistency but also robustness during the optimization. In detail, we derive
parameters of a sequence of body models, representing shape and motion of a
person, including jaw poses, facial expressions, and finger poses. We optimize
these parameters over the complete image sequence, fitting one consistent body
shape while imposing temporal consistency on the body motion, assuming linear
body joint trajectories over a short time. Our approach enables the derivation
of realistic 3D body models from image sequences, including facial expression
and articulated hands. In extensive experiments, we show that our approach
results in accurately estimated body shape and motion, also for challenging
movements and poses. Further, we apply it to the special application of sign
language analysis, where accurate and temporal consistent motion modelling is
essential, and show that the approach is well-suited for this kind of
application.
Related papers
- Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence [47.16903508897047]
In this study, we elucidate that variations in human appearance depend not only on the current frame's pose condition but also on past pose states.
We introduce Dyco, a novel method utilizing the delta pose sequence representation for non-rigid deformations.
In addition, our inertia-aware 3D human method can unprecedentedly simulate appearance changes caused by inertia at different velocities.
arXiv Detail & Related papers (2024-03-28T06:05:14Z) - Enhanced Spatio-Temporal Context for Temporally Consistent Robust 3D
Human Motion Recovery from Monocular Videos [5.258814754543826]
We propose a novel method for temporally consistent motion estimation from a monocular video.
Instead of using generic ResNet-like features, our method uses a body-aware feature representation and an independent per-frame pose.
Our method attains significantly lower acceleration error and outperforms the existing state-of-the-art methods.
arXiv Detail & Related papers (2023-11-20T10:53:59Z) - PoseVocab: Learning Joint-structured Pose Embeddings for Human Avatar
Modeling [30.93155530590843]
We present PoseVocab, a novel pose encoding method that can encode high-fidelity human details.
Given multi-view RGB videos of a character, PoseVocab constructs key poses and latent embeddings based on the training poses.
Experiments show that our method outperforms other state-of-the-art baselines.
arXiv Detail & Related papers (2023-04-25T17:25:36Z) - Drivable Volumetric Avatars using Texel-Aligned Features [52.89305658071045]
Photo telepresence requires both high-fidelity body modeling and faithful driving to enable dynamically synthesized appearance.
We propose an end-to-end framework that addresses two core challenges in modeling and driving full-body avatars of real people.
arXiv Detail & Related papers (2022-07-20T09:28:16Z) - Learning Motion-Dependent Appearance for High-Fidelity Rendering of
Dynamic Humans from a Single Camera [49.357174195542854]
A key challenge of learning the dynamics of the appearance lies in the requirement of a prohibitively large amount of observations.
We show that our method can generate a temporally coherent video of dynamic humans for unseen body poses and novel views given a single view video.
arXiv Detail & Related papers (2022-03-24T00:22:03Z) - LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human
Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body.
It is fully differentiable and optimizable with disentangled shape and pose latent spaces.
Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z) - HuMoR: 3D Human Motion Model for Robust Pose Estimation [100.55369985297797]
HuMoR is a 3D Human Motion Model for Robust Estimation of temporal pose and shape.
We introduce a conditional variational autoencoder, which learns a distribution of the change in pose at each step of a motion sequence.
We demonstrate that our model generalizes to diverse motions and body shapes after training on a large motion capture dataset.
arXiv Detail & Related papers (2021-05-10T21:04:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.