GHuNeRF: Generalizable Human NeRF from a Monocular Video
- URL: http://arxiv.org/abs/2308.16576v3
- Date: Tue, 12 Dec 2023 07:50:06 GMT
- Title: GHuNeRF: Generalizable Human NeRF from a Monocular Video
- Authors: Chen Li, Jiahao Lin, Gim Hee Lee
- Abstract summary: GHuNeRF learns a generalizable human NeRF model from a monocular video.
We validate our approach on the widely-used ZJU-MoCap dataset.
- Score: 63.741714198481354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we tackle the challenging task of learning a generalizable
human NeRF model from a monocular video. Although existing generalizable human
NeRFs have achieved impressive results, they require muti-view images or videos
which might not be always available. On the other hand, some works on
free-viewpoint rendering of human from monocular videos cannot be generalized
to unseen identities. In view of these limitations, we propose GHuNeRF to learn
a generalizable human NeRF model from a monocular video of the human performer.
We first introduce a visibility-aware aggregation scheme to compute vertex-wise
features, which is used to construct a 3D feature volume. The feature volume
can only represent the overall geometry of the human performer with
insufficient accuracy due to the limited resolution. To solve this, we further
enhance the volume feature with temporally aligned point-wise features using an
attention mechanism. Finally, the enhanced feature is used for predicting
density and color for each sampled point. A surface-guided sampling strategy is
also adopted to improve the efficiency for both training and inference. We
validate our approach on the widely-used ZJU-MoCap dataset, where we achieve
comparable performance with existing multi-view video based approaches. We also
test on the monocular People-Snapshot dataset and achieve better performance
than existing works when only monocular video is used. Our code is available at
the project website.
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - SHERF: Generalizable Human NeRF from a Single Image [59.10589479808622]
SHERF is the first generalizable Human NeRF model for recovering animatable 3D humans from a single input image.
We propose a bank of 3D-aware hierarchical features, including global, point-level, and pixel-aligned features, to facilitate informative encoding.
arXiv Detail & Related papers (2023-03-22T17:59:12Z) - You Only Train Once: Multi-Identity Free-Viewpoint Neural Human
Rendering from Monocular Videos [10.795522875068073]
You Only Train Once (YOTO) is a dynamic human generation framework, which performs free-viewpoint rendering of different human identities with distinct motions.
In this paper, we propose a set of learnable identity codes to expand the capability of the framework for multi-identity free-viewpoint rendering.
YOTO shows state-of-the-art performance on all evaluation metrics while showing significant benefits in training and inference efficiency as well as rendering quality.
arXiv Detail & Related papers (2023-03-10T10:23:17Z) - MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular
Videos [23.09306118872098]
We propose MonoNeRF to simultaneously learn point features and scene flows with point trajectory and feature correspondence constraints across frames.
Experiments show that our MonoNeRF is able to learn from multiple scenes and support new applications such as scene editing, unseen frame synthesis, and fast novel scene adaptation.
arXiv Detail & Related papers (2022-12-26T09:20:55Z) - HiFECap: Monocular High-Fidelity and Expressive Capture of Human
Performances [84.7225785061814]
HiFECap simultaneously captures human pose, clothing, facial expression, and hands just from a single RGB video.
Our method also captures high-frequency details, such as deforming wrinkles on the clothes, better than the previous works.
arXiv Detail & Related papers (2022-10-11T17:57:45Z) - Human View Synthesis using a Single Sparse RGB-D Input [16.764379184593256]
We present a novel view synthesis framework to generate realistic renders from unseen views of any human captured from a single-view sensor with sparse RGB-D.
An enhancer network leverages the overall fidelity, even in occluded areas from the original view, producing crisp renders with fine details.
arXiv Detail & Related papers (2021-12-27T20:13:53Z) - Geometry-Guided Progressive NeRF for Generalizable and Efficient Neural
Human Rendering [139.159534903657]
We develop a generalizable and efficient Neural Radiance Field (NeRF) pipeline for high-fidelity free-viewpoint human body details.
To better tackle self-occlusion, we devise a geometry-guided multi-view feature integration approach.
For achieving higher rendering efficiency, we introduce a geometry-guided progressive rendering pipeline.
arXiv Detail & Related papers (2021-12-08T14:42:10Z) - Neural Human Performer: Learning Generalizable Radiance Fields for Human
Performance Rendering [34.80975358673563]
We propose a novel approach that learns generalizable neural radiance fields based on a parametric human body model for robust performance capture.
Experiments on the ZJU-MoCap and AIST datasets show that our method significantly outperforms recent generalizable NeRF methods on unseen identities and poses.
arXiv Detail & Related papers (2021-09-15T17:32:46Z) - Neural Body: Implicit Neural Representations with Structured Latent
Codes for Novel View Synthesis of Dynamic Humans [56.63912568777483]
This paper addresses the challenge of novel view synthesis for a human performer from a very sparse set of camera views.
We propose Neural Body, a new human body representation which assumes that the learned neural representations at different frames share the same set of latent codes anchored to a deformable mesh.
Experiments on ZJU-MoCap show that our approach outperforms prior works by a large margin in terms of novel view synthesis quality.
arXiv Detail & Related papers (2020-12-31T18:55:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.