HULC: 3D Human Motion Capture with Pose Manifold Sampling and Dense
Contact Guidance
- URL: http://arxiv.org/abs/2205.05677v1
- Date: Wed, 11 May 2022 17:59:31 GMT
- Title: HULC: 3D Human Motion Capture with Pose Manifold Sampling and Dense
Contact Guidance
- Authors: Soshi Shimada, Vladislav Golyanik, Patrick P\'erez, Weipeng Xu,
Christian Theobalt
- Abstract summary: Marker-less monocular 3D human motion capture (MoCap) with scene interactions is a challenging research topic relevant for extended reality, robotics and virtual avatar generation.
We propose HULC, a new approach for 3D human MoCap which is aware of the scene geometry.
- Score: 82.09463058198546
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Marker-less monocular 3D human motion capture (MoCap) with scene interactions
is a challenging research topic relevant for extended reality, robotics and
virtual avatar generation. Due to the inherent depth ambiguity of monocular
settings, 3D motions captured with existing methods often contain severe
artefacts such as incorrect body-scene inter-penetrations, jitter and body
floating. To tackle these issues, we propose HULC, a new approach for 3D human
MoCap which is aware of the scene geometry. HULC estimates 3D poses and dense
body-environment surface contacts for improved 3D localisations, as well as the
absolute scale of the subject. Furthermore, we introduce a 3D pose trajectory
optimisation based on a novel pose manifold sampling that resolves erroneous
body-environment inter-penetrations. Although the proposed method requires less
structured inputs compared to existing scene-aware monocular MoCap algorithms,
it produces more physically-plausible poses: HULC significantly and
consistently outperforms the existing approaches in various experiments and on
different metrics.
Related papers
- Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses [9.529416246409355]
We present a method to reconstruct the world and multiple dynamic humans in 3D from a monocular video input.
As a key idea, we represent both the world and multiple humans via the recently emerging 3D Gaussian Splatting (3D-GS) representation.
arXiv Detail & Related papers (2024-04-22T17:59:50Z) - DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and
Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos.
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z) - Decaf: Monocular Deformation Capture for Face and Hand Interactions [77.75726740605748]
This paper introduces the first method that allows tracking human hands interacting with human faces in 3D from single monocular RGB videos.
We model hands as articulated objects inducing non-rigid face deformations during an active interaction.
Our method relies on a new hand-face motion and interaction capture dataset with realistic face deformations acquired with a markerless multi-view camera system.
arXiv Detail & Related papers (2023-09-28T17:59:51Z) - Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - MoCapDeform: Monocular 3D Human Motion Capture in Deformable Scenes [133.3300573151597]
MoCapDeform is a new framework for monocular 3D human motion capture.
It is the first to explicitly model non-rigid deformations of a 3D scene.
It achieves superior accuracy than competing methods on several datasets.
arXiv Detail & Related papers (2022-08-17T17:59:54Z) - Learning Motion Priors for 4D Human Body Capture in 3D Scenes [81.54377747405812]
We propose LEMO: LEarning human MOtion priors for 4D human body capture.
We introduce a novel motion prior, which reduces the jitters exhibited by poses recovered over a sequence.
We also design a contact friction term and a contact-aware motion infiller obtained via per-instance self-supervised training.
With our pipeline, we demonstrate high-quality 4D human body capture, reconstructing smooth motions and physically plausible body-scene interactions.
arXiv Detail & Related papers (2021-08-23T20:47:09Z) - Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated
Convolution [34.301501457959056]
We propose a temporal regression network with a gated convolution module to transform 2D joints to 3D.
A simple yet effective localization approach is also conducted to transform the normalized pose to the global trajectory.
Our proposed method outperforms most state-of-the-art 2D-to-3D pose estimation methods.
arXiv Detail & Related papers (2020-10-31T04:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.