Human View Synthesis using a Single Sparse RGB-D Input
- URL: http://arxiv.org/abs/2112.13889v1
- Date: Mon, 27 Dec 2021 20:13:53 GMT
- Title: Human View Synthesis using a Single Sparse RGB-D Input
- Authors: Phong Nguyen, Nikolaos Sarafianos, Christoph Lassner, Janne Heikkila,
Tony Tung
- Abstract summary: We present a novel view synthesis framework to generate realistic renders from unseen views of any human captured from a single-view sensor with sparse RGB-D.
An enhancer network leverages the overall fidelity, even in occluded areas from the original view, producing crisp renders with fine details.
- Score: 16.764379184593256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Novel view synthesis for humans in motion is a challenging computer vision
problem that enables applications such as free-viewpoint video. Existing
methods typically use complex setups with multiple input views, 3D supervision,
or pre-trained models that do not generalize well to new identities. Aiming to
address these limitations, we present a novel view synthesis framework to
generate realistic renders from unseen views of any human captured from a
single-view sensor with sparse RGB-D, similar to a low-cost depth camera, and
without actor-specific models. We propose an architecture to learn dense
features in novel views obtained by sphere-based neural rendering, and create
complete renders using a global context inpainting model. Additionally, an
enhancer network leverages the overall fidelity, even in occluded areas from
the original view, producing crisp renders with fine details. We show our
method generates high-quality novel views of synthetic and real human actors
given a single sparse RGB-D input. It generalizes to unseen identities, new
poses and faithfully reconstructs facial expressions. Our approach outperforms
prior human view synthesis methods and is robust to different levels of input
sparsity.
Related papers
- GenLayNeRF: Generalizable Layered Representations with 3D Model
Alignment for Multi-Human View Synthesis [1.6574413179773757]
GenLayNeRF is a generalizable layered scene representation for free-viewpoint rendering of multiple human subjects.
We divide the scene into multi-human layers anchored by the 3D body meshes.
We extract point-wise image-aligned and human-anchored features which are correlated and fused.
arXiv Detail & Related papers (2023-09-20T20:37:31Z) - Novel View Synthesis of Humans using Differentiable Rendering [50.57718384229912]
We present a new approach for synthesizing novel views of people in new poses.
Our synthesis makes use of diffuse Gaussian primitives that represent the underlying skeletal structure of a human.
Rendering these primitives gives results in a high-dimensional latent image, which is then transformed into an RGB image by a decoder network.
arXiv Detail & Related papers (2023-03-28T10:48:33Z) - SHERF: Generalizable Human NeRF from a Single Image [59.10589479808622]
SHERF is the first generalizable Human NeRF model for recovering animatable 3D humans from a single input image.
We propose a bank of 3D-aware hierarchical features, including global, point-level, and pixel-aligned features, to facilitate informative encoding.
arXiv Detail & Related papers (2023-03-22T17:59:12Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Neural Rendering of Humans in Novel View and Pose from Monocular Video [68.37767099240236]
We introduce a new method that generates photo-realistic humans under novel views and poses given a monocular video as input.
Our method significantly outperforms existing approaches under unseen poses and novel views given monocular videos as input.
arXiv Detail & Related papers (2022-04-04T03:09:20Z) - HDhuman: High-quality Human Novel-view Rendering from Sparse Views [15.810495442598963]
We propose HDhuman, which uses a human reconstruction network with a pixel-aligned spatial transformer and a rendering network with geometry-guided pixel-wise feature integration.
Our approach outperforms all the prior generic or specific methods on both synthetic data and real-world data.
arXiv Detail & Related papers (2022-01-20T13:04:59Z) - Human Pose Manipulation and Novel View Synthesis using Differentiable
Rendering [46.04980667824064]
We present a new approach for synthesizing novel views of people in new poses.
Our synthesis makes use of diffuse Gaussian primitives that represent the underlying skeletal structure of a human.
Rendering these primitives gives results in a high-dimensional latent image, which is then transformed into an RGB image by a decoder network.
arXiv Detail & Related papers (2021-11-24T19:00:07Z) - Neural Body: Implicit Neural Representations with Structured Latent
Codes for Novel View Synthesis of Dynamic Humans [56.63912568777483]
This paper addresses the challenge of novel view synthesis for a human performer from a very sparse set of camera views.
We propose Neural Body, a new human body representation which assumes that the learned neural representations at different frames share the same set of latent codes anchored to a deformable mesh.
Experiments on ZJU-MoCap show that our approach outperforms prior works by a large margin in terms of novel view synthesis quality.
arXiv Detail & Related papers (2020-12-31T18:55:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.