NormalGAN: Learning Detailed 3D Human from a Single RGB-D Image
- URL: http://arxiv.org/abs/2007.15340v1
- Date: Thu, 30 Jul 2020 09:35:46 GMT
- Title: NormalGAN: Learning Detailed 3D Human from a Single RGB-D Image
- Authors: Lizhen Wang, Xiaochen Zhao, Tao Yu, Songtao Wang, Yebin Liu
- Abstract summary: We propose a fast adversarial learning-based method to reconstruct the complete and detailed 3D human from a single RGB-D image.
Given a consumer RGB-D sensor, NormalGAN can generate the complete and detailed 3D human reconstruction results in 20 fps.
- Score: 34.79657678041356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose NormalGAN, a fast adversarial learning-based method to reconstruct
the complete and detailed 3D human from a single RGB-D image. Given a single
front-view RGB-D image, NormalGAN performs two steps: front-view RGB-D
rectification and back-view RGBD inference. The final model was then generated
by simply combining the front-view and back-view RGB-D information. However,
inferring backview RGB-D image with high-quality geometric details and
plausible texture is not trivial. Our key observation is: Normal maps generally
encode much more information of 3D surface details than RGB and depth images.
Therefore, learning geometric details from normal maps is superior than other
representations. In NormalGAN, an adversarial learning framework conditioned by
normal maps is introduced, which is used to not only improve the front-view
depth denoising performance, but also infer the back-view depth image with
surprisingly geometric details. Moreover, for texture recovery, we remove
shading information from the front-view RGB image based on the refined normal
map, which further improves the quality of the back-view color inference.
Results and experiments on both testing data set and real captured data
demonstrate the superior performance of our approach. Given a consumer RGB-D
sensor, NormalGAN can generate the complete and detailed 3D human
reconstruction results in 20 fps, which further enables convenient interactive
experiences in telepresence, AR/VR and gaming scenarios.
Related papers
- Normal-guided Detail-Preserving Neural Implicit Functions for High-Fidelity 3D Surface Reconstruction [6.4279213810512665]
Current methods for learning neural implicit representations from RGB or RGBD images produce 3D surfaces with missing parts and details.
This paper demonstrates that training neural representations with first-order differential properties, i.e. surface normals, leads to highly accurate 3D surface reconstruction.
arXiv Detail & Related papers (2024-06-07T11:48:47Z) - ANIM: Accurate Neural Implicit Model for Human Reconstruction from a single RGB-D image [40.03212588672639]
ANIM is a novel method that reconstructs arbitrary 3D human shapes from single-view RGB-D images with an unprecedented level of accuracy.
Our model learns geometric details from both pixel-aligned and voxel-aligned features to leverage depth information.
Experiments demonstrate that ANIM outperforms state-of-the-art works that use RGB, surface normals, point cloud or RGB-D data as input.
arXiv Detail & Related papers (2024-03-15T14:45:38Z) - A novel approach for holographic 3D content generation without depth map [2.905273049932301]
We propose a deep learning-based method to synthesize the volumetric digital holograms using only the given RGB image.
Through experiments, we demonstrate that the volumetric hologram generated through our proposed model is more accurate than that of competitive models.
arXiv Detail & Related papers (2023-09-26T14:37:31Z) - DFormer: Rethinking RGBD Representation Learning for Semantic
Segmentation [76.81628995237058]
DFormer is a novel framework to learn transferable representations for RGB-D segmentation tasks.
It pretrains the backbone using image-depth pairs from ImageNet-1K.
DFormer achieves new state-of-the-art performance on two popular RGB-D tasks.
arXiv Detail & Related papers (2023-09-18T11:09:11Z) - Beyond Visual Field of View: Perceiving 3D Environment with Echoes and
Vision [51.385731364529306]
This paper focuses on perceiving and navigating 3D environments using echoes and RGB image.
In particular, we perform depth estimation by fusing RGB image with echoes, received from multiple orientations.
We show that the echoes provide holistic and in-expensive information about the 3D structures complementing the RGB image.
arXiv Detail & Related papers (2022-07-03T22:31:47Z) - Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images [89.81919625224103]
Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images.
We present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection.
arXiv Detail & Related papers (2022-01-01T03:02:27Z) - Semantic-embedded Unsupervised Spectral Reconstruction from Single RGB
Images in the Wild [48.44194221801609]
We propose a new lightweight and end-to-end learning-based framework to tackle this challenge.
We progressively spread the differences between input RGB images and re-projected RGB images from recovered HS images via effective camera spectral response function estimation.
Our method significantly outperforms state-of-the-art unsupervised methods and even exceeds the latest supervised method under some settings.
arXiv Detail & Related papers (2021-08-15T05:19:44Z) - Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD
Images [69.5662419067878]
Grounding referring expressions in RGBD image has been an emerging field.
We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion.
Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that localizes the relevant regions in the RGBD image.
Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object.
arXiv Detail & Related papers (2021-03-14T11:18:50Z) - Is Depth Really Necessary for Salient Object Detection? [50.10888549190576]
We make the first attempt in realizing an unified depth-aware framework with only RGB information as input for inference.
Not only surpasses the state-of-the-art performances on five public RGB SOD benchmarks, but also surpasses the RGBD-based methods on five benchmarks by a large margin.
arXiv Detail & Related papers (2020-05-30T13:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.