Human Performance Capture from Monocular Video in the Wild
- URL: http://arxiv.org/abs/2111.14672v2
- Date: Tue, 30 Nov 2021 16:03:36 GMT
- Title: Human Performance Capture from Monocular Video in the Wild
- Authors: Chen Guo, Xu Chen, Jie Song and Otmar Hilliges
- Abstract summary: We propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses.
Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW.
- Score: 50.34917313325813
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Capturing the dynamically deforming 3D shape of clothed human is essential
for numerous applications, including VR/AR, autonomous driving, and
human-computer interaction. Existing methods either require a highly
specialized capturing setup, such as expensive multi-view imaging systems, or
they lack robustness to challenging body poses. In this work, we propose a
method capable of capturing the dynamic 3D human shape from a monocular video
featuring challenging body poses, without any additional input. We first build
a 3D template human model of the subject based on a learned regression model.
We then track this template model's deformation under challenging body
articulations based on 2D image observations. Our method outperforms
state-of-the-art methods on an in-the-wild human video dataset 3DPW. Moreover,
we demonstrate its efficacy in robustness and generalizability on videos from
iPER datasets.
Related papers
- MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild [32.6521941706907]
We present MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos.
We first define a layered neural representation for the entire scene, composited by individual human and background models.
We learn the layered neural representation from videos via our layer-wise differentiable volume rendering.
arXiv Detail & Related papers (2024-06-03T17:59:57Z) - Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance [25.346255905155424]
We introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework.
By representing the 3D human parametric model as the motion guidance, we can perform parametric shape alignment of the human body between the reference image and the source video motion.
Our approach also exhibits superior generalization capabilities on the proposed in-the-wild dataset.
arXiv Detail & Related papers (2024-03-21T18:52:58Z) - Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via
Self-supervised Scene Decomposition [40.46674919612935]
We present Vid2Avatar, a method to learn human avatars from monocular in-the-wild videos.
Our method does not require any groundtruth supervision or priors extracted from large datasets of clothed human scans.
It solves the tasks of scene decomposition and surface reconstruction directly in 3D by modeling both the human and the background in the scene jointly.
arXiv Detail & Related papers (2023-02-22T18:59:17Z) - Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - HiFECap: Monocular High-Fidelity and Expressive Capture of Human
Performances [84.7225785061814]
HiFECap simultaneously captures human pose, clothing, facial expression, and hands just from a single RGB video.
Our method also captures high-frequency details, such as deforming wrinkles on the clothes, better than the previous works.
arXiv Detail & Related papers (2022-10-11T17:57:45Z) - Self-Supervised 3D Human Pose Estimation in Static Video Via Neural
Rendering [5.568218439349004]
Inferring 3D human pose from 2D images is a challenging and long-standing problem in the field of computer vision.
We present preliminary results for a method to estimate 3D pose from 2D video containing a single person.
arXiv Detail & Related papers (2022-10-10T09:24:07Z) - 3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations.
A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z) - S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling [103.65625425020129]
We represent the pedestrian's shape, pose and skinning weights as neural implicit functions that are directly learned from data.
We demonstrate the effectiveness of our approach on various datasets and show that our reconstructions outperform existing state-of-the-art methods.
arXiv Detail & Related papers (2021-01-17T02:16:56Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - Weakly Supervised 3D Human Pose and Shape Reconstruction with
Normalizing Flows [43.89097619822221]
We present semi-supervised and self-supervised models that support training and good generalization in real-world images and video.
Our formulation is based on kinematic latent normalizing flow representations and dynamics, as well as differentiable, semantic body part alignment loss functions.
In extensive experiments using 3D motion capture datasets like CMU, Human3.6M, 3DPW, or AMASS, we show that the proposed methods outperform the state of the art.
arXiv Detail & Related papers (2020-03-23T16:11:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.