Related papers: HiFECap: Monocular High-Fidelity and Expressive Capture of Human Performances

HiFECap: Monocular High-Fidelity and Expressive Capture of Human Performances

URL: http://arxiv.org/abs/2210.05665v1
Date: Tue, 11 Oct 2022 17:57:45 GMT
Title: HiFECap: Monocular High-Fidelity and Expressive Capture of Human Performances
Authors: Yue Jiang, Marc Habermann, Vladislav Golyanik, Christian Theobalt
Abstract summary: HiFECap simultaneously captures human pose, clothing, facial expression, and hands just from a single RGB video. Our method also captures high-frequency details, such as deforming wrinkles on the clothes, better than the previous works.
Score: 84.7225785061814
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Monocular 3D human performance capture is indispensable for many applications in computer graphics and vision for enabling immersive experiences. However, detailed capture of humans requires tracking of multiple aspects, including the skeletal pose, the dynamic surface, which includes clothing, hand gestures as well as facial expressions. No existing monocular method allows joint tracking of all these components. To this end, we propose HiFECap, a new neural human performance capture approach, which simultaneously captures human pose, clothing, facial expression, and hands just from a single RGB video. We demonstrate that our proposed network architecture, the carefully designed training strategy, and the tight integration of parametric face and hand models to a template mesh enable the capture of all these individual aspects. Importantly, our method also captures high-frequency details, such as deforming wrinkles on the clothes, better than the previous works. Furthermore, we show that HiFECap outperforms the state-of-the-art human performance capture approaches qualitatively and quantitatively while for the first time capturing all aspects of the human.

Related papers

EVA: Expressive Virtual Avatars from Multi-view Videos [51.33851869426057]
We introduce Expressive Virtual Avatars (EVA), an actor-specific, fully controllable, and expressive human avatar framework.<n>EVA achieves high-fidelity, lifelike renderings in real time while enabling independent control of facial expressions, body movements, and hand gestures.<n>This work represents a significant advancement towards fully drivable digital human models.
arXiv Detail & Related papers (2025-05-21T11:22:52Z)
HumanGif: Single-View Human Diffusion with Generative Prior [25.516544735593087]
We propose HumanGif, a single-view human diffusion model with generative priors. Specifically, we formulate the single-view-based 3D human novel view and pose synthesis as a single-view-conditioned human diffusion process. We show that HumanGif achieves the best perceptual performance, with better generalizability for novel view and pose synthesis.
arXiv Detail & Related papers (2025-02-17T17:55:27Z)
WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction [51.22641018932625]
We present WonderHuman to reconstruct dynamic human avatars from a monocular video for high-fidelity novel view synthesis. Our method achieves SOTA performance in producing photorealistic renderings from the given monocular video.
arXiv Detail & Related papers (2025-02-03T04:43:41Z)
Human Multi-View Synthesis from a Single-View Model:Transferred Body and Face Representations [7.448124739584319]
We propose an innovative framework that leverages transferred body and facial representations for multi-view human synthesis. Specifically, we use a single-view model pretrained on a large-scale human dataset to develop a multi-view body representation. Our approach outperforms the current state-of-the-art methods, achieving superior performance in multi-view human synthesis.
arXiv Detail & Related papers (2024-12-04T04:02:17Z)
MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild [32.6521941706907]
We present MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. We first define a layered neural representation for the entire scene, composited by individual human and background models. We learn the layered neural representation from videos via our layer-wise differentiable volume rendering.
arXiv Detail & Related papers (2024-06-03T17:59:57Z)
HINT: Learning Complete Human Neural Representations from Limited Viewpoints [69.76947323932107]
We propose a NeRF-based algorithm able to learn a detailed and complete human model from limited viewing angles. As a result, our method can reconstruct complete humans even from a few viewing angles, increasing performance by more than 15% PSNR.
arXiv Detail & Related papers (2024-05-30T05:43:09Z)
VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation [79.99551055245071]
We propose VividPose, an end-to-end pipeline that ensures superior temporal stability. An identity-aware appearance controller integrates additional facial information without compromising other appearance details. A geometry-aware pose controller utilizes both dense rendering maps from SMPL-X and sparse skeleton maps. VividPose exhibits superior generalization capabilities on our proposed in-the-wild dataset.
arXiv Detail & Related papers (2024-05-28T13:18:32Z)
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis [40.869862603815875]
VLOGGER is a method for audio-driven human video generation from a single input image. We use a novel diffusion-based architecture that augments text-to-image models with both spatial and temporal controls. We show applications in video editing and personalization.
arXiv Detail & Related papers (2024-03-13T17:59:02Z)
CapHuman: Capture Your Moments in Parallel Universes [60.06408546134581]
We present a new framework named CapHuman. CapHuman encodes identity features and then learns to align them into the latent space. We introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner.
arXiv Detail & Related papers (2024-02-01T14:41:59Z)
GHuNeRF: Generalizable Human NeRF from a Monocular Video [63.741714198481354]
GHuNeRF learns a generalizable human NeRF model from a monocular video. We validate our approach on the widely-used ZJU-MoCap dataset.
arXiv Detail & Related papers (2023-08-31T09:19:06Z)
Human Performance Capture from Monocular Video in the Wild [50.34917313325813]
We propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses. Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW.
arXiv Detail & Related papers (2021-11-29T16:32:41Z)
Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering [34.80975358673563]
We propose a novel approach that learns generalizable neural radiance fields based on a parametric human body model for robust performance capture. Experiments on the ZJU-MoCap and AIST datasets show that our method significantly outperforms recent generalizable NeRF methods on unseen identities and poses.
arXiv Detail & Related papers (2021-09-15T17:32:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.