Semantic-Human: Neural Rendering of Humans from Monocular Video with
Human Parsing
- URL: http://arxiv.org/abs/2308.09894v1
- Date: Sat, 19 Aug 2023 03:18:19 GMT
- Title: Semantic-Human: Neural Rendering of Humans from Monocular Video with
Human Parsing
- Authors: Jie Zhang, Pengcheng Shi, Zaiwang Gu, Yiyang Zhou, Zhi Wang
- Abstract summary: We present Semantic-Human, a novel method that achieves photorealistic details and viewpoint-consistent human parsing for the neural rendering of humans.
Specifically, we extend neural radiance fields (NeRF) to jointly encode semantics, appearance and geometry to achieve accurate 2D semantic labels.
We also showcase various compelling applications, including label denoising, label synthesis and image editing.
- Score: 14.264835399504376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The neural rendering of humans is a topic of great research significance.
However, previous works mostly focus on achieving photorealistic details,
neglecting the exploration of human parsing. Additionally, classical semantic
work are all limited in their ability to efficiently represent fine results in
complex motions. Human parsing is inherently related to radiance
reconstruction, as similar appearance and geometry often correspond to similar
semantic part. Furthermore, previous works often design a motion field that
maps from the observation space to the canonical space, while it tends to
exhibit either underfitting or overfitting, resulting in limited
generalization. In this paper, we present Semantic-Human, a novel method that
achieves both photorealistic details and viewpoint-consistent human parsing for
the neural rendering of humans. Specifically, we extend neural radiance fields
(NeRF) to jointly encode semantics, appearance and geometry to achieve accurate
2D semantic labels using noisy pseudo-label supervision. Leveraging the
inherent consistency and smoothness properties of NeRF, Semantic-Human achieves
consistent human parsing in both continuous and novel views. We also introduce
constraints derived from the SMPL surface for the motion field and
regularization for the recovered volumetric geometry. We have evaluated the
model using the ZJU-MoCap dataset, and the obtained highly competitive results
demonstrate the effectiveness of our proposed Semantic-Human. We also showcase
various compelling applications, including label denoising, label synthesis and
image editing, and empirically validate its advantageous properties.
Related papers
- Label-free Neural Semantic Image Synthesis [12.194020204848492]
We introduce the concept of neural semantic image synthesis, which uses neural layouts extracted from pre-trained foundation models as conditioning.
We experimentally show that images synthesized via neural semantic image synthesis achieve similar or superior pixel-level alignment of semantic classes.
We show that images generated by neural layout conditioning can effectively augment real data for training various perception tasks.
arXiv Detail & Related papers (2024-07-01T20:30:23Z) - InceptionHuman: Controllable Prompt-to-NeRF for Photorealistic 3D Human Generation [61.62346472443454]
InceptionHuman is a prompt-to-NeRF framework that allows easy control via a combination of prompts in different modalities to generate photorealistic 3D humans.
InceptionHuman achieves consistent 3D human generation within a progressively refined NeRF space.
arXiv Detail & Related papers (2023-11-27T15:49:41Z) - HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion [114.15397904945185]
We propose a unified framework, HyperHuman, that generates in-the-wild human images of high realism and diverse layouts.
Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network.
Our framework yields the state-of-the-art performance, generating hyper-realistic human images under diverse scenarios.
arXiv Detail & Related papers (2023-10-12T17:59:34Z) - Semantic Brain Decoding: from fMRI to conceptually similar image
reconstruction of visual stimuli [0.29005223064604074]
We propose a novel approach to brain decoding that also relies on semantic and contextual similarity.
We employ an fMRI dataset of natural image vision and create a deep learning decoding pipeline inspired by the existence of both bottom-up and top-down processes in human vision.
We produce reconstructions of visual stimuli that match the original content very well on a semantic level, surpassing the state of the art in previous literature.
arXiv Detail & Related papers (2022-12-13T16:54:08Z) - Neural Novel Actor: Learning a Generalized Animatable Neural
Representation for Human Actors [98.24047528960406]
We propose a new method for learning a generalized animatable neural representation from a sparse set of multi-view imagery of multiple persons.
The learned representation can be used to synthesize novel view images of an arbitrary person from a sparse set of cameras, and further animate them with the user's pose control.
arXiv Detail & Related papers (2022-08-25T07:36:46Z) - HSPACE: Synthetic Parametric Humans Animated in Complex Environments [67.8628917474705]
We build a large-scale photo-realistic dataset, Human-SPACE, of animated humans placed in complex indoor and outdoor environments.
We combine a hundred diverse individuals of varying ages, gender, proportions, and ethnicity, with hundreds of motions and scenes, in order to generate an initial dataset of over 1 million frames.
Assets are generated automatically, at scale, and are compatible with existing real time rendering and game engines.
arXiv Detail & Related papers (2021-12-23T22:27:55Z) - LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human
Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body.
It is fully differentiable and optimizable with disentangled shape and pose latent spaces.
Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z) - Neural Human Performer: Learning Generalizable Radiance Fields for Human
Performance Rendering [34.80975358673563]
We propose a novel approach that learns generalizable neural radiance fields based on a parametric human body model for robust performance capture.
Experiments on the ZJU-MoCap and AIST datasets show that our method significantly outperforms recent generalizable NeRF methods on unseen identities and poses.
arXiv Detail & Related papers (2021-09-15T17:32:46Z) - Neural Actor: Neural Free-view Synthesis of Human Actors with Pose
Control [80.79820002330457]
We propose a new method for high-quality synthesis of humans from arbitrary viewpoints and under arbitrary controllable poses.
Our method achieves better quality than the state-of-the-arts on playback as well as novel pose synthesis, and can even generalize well to new poses that starkly differ from the training poses.
arXiv Detail & Related papers (2021-06-03T17:40:48Z) - Learning Compositional Radiance Fields of Dynamic Human Heads [13.272666180264485]
We propose a novel compositional 3D representation that combines the best of previous methods to produce both higher-resolution and faster results.
Differentiable volume rendering is employed to compute photo-realistic novel views of the human head and upper body.
Our approach achieves state-of-the-art results for synthesizing novel views of dynamic human heads and the upper body.
arXiv Detail & Related papers (2020-12-17T22:19:27Z) - Grasping Field: Learning Implicit Representations for Human Grasps [16.841780141055505]
We propose an expressive representation for human grasp modelling that is efficient and easy to integrate with deep neural networks.
We name this 3D to 2D mapping as Grasping Field, parameterize it with a deep neural network, and learn it from data.
Our generative model is able to synthesize high-quality human grasps, given only on a 3D object point cloud.
arXiv Detail & Related papers (2020-08-10T23:08:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.