HiFECap: Monocular High-Fidelity and Expressive Capture of Human
Performances
- URL: http://arxiv.org/abs/2210.05665v1
- Date: Tue, 11 Oct 2022 17:57:45 GMT
- Title: HiFECap: Monocular High-Fidelity and Expressive Capture of Human
Performances
- Authors: Yue Jiang, Marc Habermann, Vladislav Golyanik, Christian Theobalt
- Abstract summary: HiFECap simultaneously captures human pose, clothing, facial expression, and hands just from a single RGB video.
Our method also captures high-frequency details, such as deforming wrinkles on the clothes, better than the previous works.
- Score: 84.7225785061814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular 3D human performance capture is indispensable for many applications
in computer graphics and vision for enabling immersive experiences. However,
detailed capture of humans requires tracking of multiple aspects, including the
skeletal pose, the dynamic surface, which includes clothing, hand gestures as
well as facial expressions. No existing monocular method allows joint tracking
of all these components. To this end, we propose HiFECap, a new neural human
performance capture approach, which simultaneously captures human pose,
clothing, facial expression, and hands just from a single RGB video. We
demonstrate that our proposed network architecture, the carefully designed
training strategy, and the tight integration of parametric face and hand models
to a template mesh enable the capture of all these individual aspects.
Importantly, our method also captures high-frequency details, such as deforming
wrinkles on the clothes, better than the previous works. Furthermore, we show
that HiFECap outperforms the state-of-the-art human performance capture
approaches qualitatively and quantitatively while for the first time capturing
all aspects of the human.
Related papers
- MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild [32.6521941706907]
We present MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos.
We first define a layered neural representation for the entire scene, composited by individual human and background models.
We learn the layered neural representation from videos via our layer-wise differentiable volume rendering.
arXiv Detail & Related papers (2024-06-03T17:59:57Z) - HINT: Learning Complete Human Neural Representations from Limited Viewpoints [69.76947323932107]
We propose a NeRF-based algorithm able to learn a detailed and complete human model from limited viewing angles.
As a result, our method can reconstruct complete humans even from a few viewing angles, increasing performance by more than 15% PSNR.
arXiv Detail & Related papers (2024-05-30T05:43:09Z) - VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation [79.99551055245071]
We propose VividPose, an end-to-end pipeline that ensures superior temporal stability.
An identity-aware appearance controller integrates additional facial information without compromising other appearance details.
A geometry-aware pose controller utilizes both dense rendering maps from SMPL-X and sparse skeleton maps.
VividPose exhibits superior generalization capabilities on our proposed in-the-wild dataset.
arXiv Detail & Related papers (2024-05-28T13:18:32Z) - VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis [40.869862603815875]
VLOGGER is a method for audio-driven human video generation from a single input image.
We use a novel diffusion-based architecture that augments text-to-image models with both spatial and temporal controls.
We show applications in video editing and personalization.
arXiv Detail & Related papers (2024-03-13T17:59:02Z) - CapHuman: Capture Your Moments in Parallel Universes [60.06408546134581]
We present a new framework named CapHuman.
CapHuman encodes identity features and then learns to align them into the latent space.
We introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner.
arXiv Detail & Related papers (2024-02-01T14:41:59Z) - GHuNeRF: Generalizable Human NeRF from a Monocular Video [63.741714198481354]
GHuNeRF learns a generalizable human NeRF model from a monocular video.
We validate our approach on the widely-used ZJU-MoCap dataset.
arXiv Detail & Related papers (2023-08-31T09:19:06Z) - Human Performance Capture from Monocular Video in the Wild [50.34917313325813]
We propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses.
Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW.
arXiv Detail & Related papers (2021-11-29T16:32:41Z) - Neural Human Performer: Learning Generalizable Radiance Fields for Human
Performance Rendering [34.80975358673563]
We propose a novel approach that learns generalizable neural radiance fields based on a parametric human body model for robust performance capture.
Experiments on the ZJU-MoCap and AIST datasets show that our method significantly outperforms recent generalizable NeRF methods on unseen identities and poses.
arXiv Detail & Related papers (2021-09-15T17:32:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.