SHERF: Generalizable Human NeRF from a Single Image
- URL: http://arxiv.org/abs/2303.12791v2
- Date: Wed, 16 Aug 2023 17:58:35 GMT
- Title: SHERF: Generalizable Human NeRF from a Single Image
- Authors: Shoukang Hu, Fangzhou Hong, Liang Pan, Haiyi Mei, Lei Yang, Ziwei Liu
- Abstract summary: SHERF is the first generalizable Human NeRF model for recovering animatable 3D humans from a single input image.
We propose a bank of 3D-aware hierarchical features, including global, point-level, and pixel-aligned features, to facilitate informative encoding.
- Score: 59.10589479808622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing Human NeRF methods for reconstructing 3D humans typically rely on
multiple 2D images from multi-view cameras or monocular videos captured from
fixed camera views. However, in real-world scenarios, human images are often
captured from random camera angles, presenting challenges for high-quality 3D
human reconstruction. In this paper, we propose SHERF, the first generalizable
Human NeRF model for recovering animatable 3D humans from a single input image.
SHERF extracts and encodes 3D human representations in canonical space,
enabling rendering and animation from free views and poses. To achieve
high-fidelity novel view and pose synthesis, the encoded 3D human
representations should capture both global appearance and local fine-grained
textures. To this end, we propose a bank of 3D-aware hierarchical features,
including global, point-level, and pixel-aligned features, to facilitate
informative encoding. Global features enhance the information extracted from
the single input image and complement the information missing from the partial
2D observation. Point-level features provide strong clues of 3D human
structure, while pixel-aligned features preserve more fine-grained details. To
effectively integrate the 3D-aware hierarchical feature bank, we design a
feature fusion transformer. Extensive experiments on THuman, RenderPeople,
ZJU_MoCap, and HuMMan datasets demonstrate that SHERF achieves state-of-the-art
performance, with better generalizability for novel view and pose synthesis.
Related papers
- Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses [9.529416246409355]
We present a method to reconstruct the world and multiple dynamic humans in 3D from a monocular video input.
As a key idea, we represent both the world and multiple humans via the recently emerging 3D Gaussian Splatting (3D-GS) representation.
arXiv Detail & Related papers (2024-04-22T17:59:50Z) - InceptionHuman: Controllable Prompt-to-NeRF for Photorealistic 3D Human Generation [61.62346472443454]
InceptionHuman is a prompt-to-NeRF framework that allows easy control via a combination of prompts in different modalities to generate photorealistic 3D humans.
InceptionHuman achieves consistent 3D human generation within a progressively refined NeRF space.
arXiv Detail & Related papers (2023-11-27T15:49:41Z) - Single-Image 3D Human Digitization with Shape-Guided Diffusion [31.99621159464388]
NeRF and its variants typically require videos or images from different viewpoints.
We present an approach to generate a 360-degree view of a person with a consistent, high-resolution appearance from a single input image.
arXiv Detail & Related papers (2023-11-15T18:59:56Z) - DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via
Diffusion Models [55.71306021041785]
We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars.
We leverage the SMPL model to provide shape and pose guidance for the generation.
We also jointly optimize the losses computed from the full body and from the zoomed-in 3D head to alleviate the common multi-face ''Janus'' problem.
arXiv Detail & Related papers (2023-04-03T12:11:51Z) - 3DHumanGAN: 3D-Aware Human Image Generation with 3D Pose Mapping [37.14866512377012]
3DHumanGAN is a 3D-aware generative adversarial network that synthesizes photorealistic images of full-body humans.
We propose a novel generator architecture in which a 2D convolutional backbone is modulated by a 3D pose mapping network.
arXiv Detail & Related papers (2022-12-14T17:59:03Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - DRaCoN -- Differentiable Rasterization Conditioned Neural Radiance
Fields for Articulated Avatars [92.37436369781692]
We present DRaCoN, a framework for learning full-body volumetric avatars.
It exploits the advantages of both the 2D and 3D neural rendering techniques.
Experiments on the challenging ZJU-MoCap and Human3.6M datasets indicate that DRaCoN outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-03-29T17:59:15Z) - 3D-Aware Semantic-Guided Generative Model for Human Synthesis [67.86621343494998]
This paper proposes a 3D-aware Semantic-Guided Generative Model (3D-SGAN) for human image synthesis.
Our experiments on the DeepFashion dataset show that 3D-SGAN significantly outperforms the most recent baselines.
arXiv Detail & Related papers (2021-12-02T17:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.