Neural Capture of Animatable 3D Human from Monocular Video
- URL: http://arxiv.org/abs/2208.08728v1
- Date: Thu, 18 Aug 2022 09:20:48 GMT
- Title: Neural Capture of Animatable 3D Human from Monocular Video
- Authors: Gusi Te, Xiu Li, Xiao Li, Jinglu Wang, Wei Hu, Yan Lu
- Abstract summary: We present a novel paradigm of building an animatable 3D human representation from a monocular video input, such that it can be rendered in any unseen poses and views.
Our method is based on a dynamic Neural Radiance Field (NeRF) rigged by a mesh-based parametric 3D human model serving as a geometry proxy.
- Score: 38.974181971541846
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel paradigm of building an animatable 3D human representation
from a monocular video input, such that it can be rendered in any unseen poses
and views. Our method is based on a dynamic Neural Radiance Field (NeRF) rigged
by a mesh-based parametric 3D human model serving as a geometry proxy. Previous
methods usually rely on multi-view videos or accurate 3D geometry information
as additional inputs; besides, most methods suffer from degraded quality when
generalized to unseen poses. We identify that the key to generalization is a
good input embedding for querying dynamic NeRF: A good input embedding should
define an injective mapping in the full volumetric space, guided by surface
mesh deformation under pose variation. Based on this observation, we propose to
embed the input query with its relationship to local surface regions spanned by
a set of geodesic nearest neighbors on mesh vertices. By including both
position and relative distance information, our embedding defines a
distance-preserved deformation mapping and generalizes well to unseen poses. To
reduce the dependency on additional inputs, we first initialize per-frame 3D
meshes using off-the-shelf tools and then propose a pipeline to jointly
optimize NeRF and refine the initial mesh. Extensive experiments show our
method can synthesize plausible human rendering results under unseen poses and
views.
Related papers
- Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation [32.30055363306321]
We propose a paradigm for seamlessly unifying different human pose and shape-related tasks and datasets.
Our formulation is centered on the ability - both at training and test time - to query any arbitrary point of the human volume.
We can naturally exploit differently annotated data sources including mesh, 2D/3D skeleton and dense pose, without having to convert between them.
arXiv Detail & Related papers (2024-07-10T10:44:18Z) - Differentiable Blocks World: Qualitative 3D Decomposition by Rendering
Primitives [70.32817882783608]
We present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives.
Unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images.
We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points.
arXiv Detail & Related papers (2023-07-11T17:58:31Z) - Sampling is Matter: Point-guided 3D Human Mesh Reconstruction [0.0]
This paper presents a simple yet powerful method for 3D human mesh reconstruction from a single RGB image.
Experimental results on benchmark datasets show that the proposed method efficiently improves the performance of 3D human mesh reconstruction.
arXiv Detail & Related papers (2023-04-19T08:45:26Z) - 3D Surface Reconstruction in the Wild by Deforming Shape Priors from
Synthetic Data [24.97027425606138]
Reconstructing the underlying 3D surface of an object from a single image is a challenging problem.
We present a new method for joint category-specific 3D reconstruction and object pose estimation from a single image.
Our approach achieves state-of-the-art reconstruction performance across several real-world datasets.
arXiv Detail & Related papers (2023-02-24T20:37:27Z) - Shape, Pose, and Appearance from a Single Image via Bootstrapped
Radiance Field Inversion [54.151979979158085]
We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available.
We leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution.
Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios.
arXiv Detail & Related papers (2022-11-21T17:42:42Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses.
We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network.
Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z) - DC-GNet: Deep Mesh Relation Capturing Graph Convolution Network for 3D
Human Shape Reconstruction [1.290382979353427]
We propose a Deep Mesh Relation Capturing Graph Convolution Network, DC-GNet, with a shape completion task for 3D human shape reconstruction.
Our approach encodes mesh structure from more subtle relations between nodes in a more distant region.
Our shape completion module alleviates the performance degradation issue in the outdoor scene.
arXiv Detail & Related papers (2021-08-27T16:43:32Z) - Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh
Recovery from a 2D Human Pose [70.23652933572647]
We propose a novel graph convolutional neural network (GraphCNN)-based system that estimates the 3D coordinates of human mesh vertices directly from the 2D human pose.
We show that our Pose2Mesh outperforms the previous 3D human pose and mesh estimation methods on various benchmark datasets.
arXiv Detail & Related papers (2020-08-20T16:01:56Z) - Combining Implicit Function Learning and Parametric Models for 3D Human
Reconstruction [123.62341095156611]
Implicit functions represented as deep learning approximations are powerful for reconstructing 3D surfaces.
Such features are essential in building flexible models for both computer graphics and computer vision.
We present methodology that combines detail-rich implicit functions and parametric representations.
arXiv Detail & Related papers (2020-07-22T13:46:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.