Related papers: Imposing Temporal Consistency on Deep Monocular Body Shape and Pose Estimation

Imposing Temporal Consistency on Deep Monocular Body Shape and Pose Estimation

URL: http://arxiv.org/abs/2202.03074v2
Date: Tue, 8 Feb 2022 16:58:13 GMT
Title: Imposing Temporal Consistency on Deep Monocular Body Shape and Pose Estimation
Authors: Alexandra Zimmer, Anna Hilsmann, Wieland Morgenstern, Peter Eisert
Abstract summary: This paper presents an elegant solution for the integration of temporal constraints in the fitting process. We derive parameters of a sequence of body models, representing shape and motion of a person, including jaw poses, facial expressions, and finger poses. Our approach enables the derivation of realistic 3D body models from image sequences, including facial expression and articulated hands.
Score: 67.23327074124855
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate and temporally consistent modeling of human bodies is essential for a wide range of applications, including character animation, understanding human social behavior and AR/VR interfaces. Capturing human motion accurately from a monocular image sequence is still challenging and the modeling quality is strongly influenced by the temporal consistency of the captured body motion. Our work presents an elegant solution for the integration of temporal constraints in the fitting process. This does not only increase temporal consistency but also robustness during the optimization. In detail, we derive parameters of a sequence of body models, representing shape and motion of a person, including jaw poses, facial expressions, and finger poses. We optimize these parameters over the complete image sequence, fitting one consistent body shape while imposing temporal consistency on the body motion, assuming linear body joint trajectories over a short time. Our approach enables the derivation of realistic 3D body models from image sequences, including facial expression and articulated hands. In extensive experiments, we show that our approach results in accurately estimated body shape and motion, also for challenging movements and poses. Further, we apply it to the special application of sign language analysis, where accurate and temporal consistent motion modelling is essential, and show that the approach is well-suited for this kind of application.

Related papers

DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior [82.9526308672547]
We present DPoser-X, a diffusion-based prior model for 3D whole-body human poses.<n>Our approach unifies various pose-centric tasks as inverse problems, solving them through variational diffusion sampling.<n>Our model consistently outperforms state-of-the-art alternatives, establishing a new benchmark for whole-body human pose prior modeling.
arXiv Detail & Related papers (2025-08-01T12:56:39Z)
DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance [9.898947423344884]
We propose a diffusion transformer (DiT) based framework, DreamActor-M1, with hybrid guidance to overcome limitations. For motion guidance, our hybrid control signals that integrate implicit facial representations, 3D head spheres, and 3D body skeletons achieve robust control of facial expressions and body movements. Experiments demonstrate that our method outperforms the state-of-the-art works, delivering expressive results for portraits, upper-body, and full-body generation.
arXiv Detail & Related papers (2025-04-02T13:30:32Z)
Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence [47.16903508897047]
In this study, we elucidate that variations in human appearance depend not only on the current frame's pose condition but also on past pose states. We introduce Dyco, a novel method utilizing the delta pose sequence representation for non-rigid deformations. In addition, our inertia-aware 3D human method can unprecedentedly simulate appearance changes caused by inertia at different velocities.
arXiv Detail & Related papers (2024-03-28T06:05:14Z)
Enhanced Spatio-Temporal Context for Temporally Consistent Robust 3D Human Motion Recovery from Monocular Videos [5.258814754543826]
We propose a novel method for temporally consistent motion estimation from a monocular video. Instead of using generic ResNet-like features, our method uses a body-aware feature representation and an independent per-frame pose. Our method attains significantly lower acceleration error and outperforms the existing state-of-the-art methods.
arXiv Detail & Related papers (2023-11-20T10:53:59Z)
PoseVocab: Learning Joint-structured Pose Embeddings for Human Avatar Modeling [30.93155530590843]
We present PoseVocab, a novel pose encoding method that can encode high-fidelity human details. Given multi-view RGB videos of a character, PoseVocab constructs key poses and latent embeddings based on the training poses. Experiments show that our method outperforms other state-of-the-art baselines.
arXiv Detail & Related papers (2023-04-25T17:25:36Z)
Drivable Volumetric Avatars using Texel-Aligned Features [52.89305658071045]
Photo telepresence requires both high-fidelity body modeling and faithful driving to enable dynamically synthesized appearance. We propose an end-to-end framework that addresses two core challenges in modeling and driving full-body avatars of real people.
arXiv Detail & Related papers (2022-07-20T09:28:16Z)
Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera [49.357174195542854]
A key challenge of learning the dynamics of the appearance lies in the requirement of a prohibitively large amount of observations. We show that our method can generate a temporally coherent video of dynamic humans for unseen body poses and novel views given a single view video.
arXiv Detail & Related papers (2022-03-24T00:22:03Z)
LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body. It is fully differentiable and optimizable with disentangled shape and pose latent spaces. Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z)
HuMoR: 3D Human Motion Model for Robust Pose Estimation [100.55369985297797]
HuMoR is a 3D Human Motion Model for Robust Estimation of temporal pose and shape. We introduce a conditional variational autoencoder, which learns a distribution of the change in pose at each step of a motion sequence. We demonstrate that our model generalizes to diverse motions and body shapes after training on a large motion capture dataset.
arXiv Detail & Related papers (2021-05-10T21:04:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.