Detailed 3D Human Body Reconstruction from Multi-view Images Combining
Voxel Super-Resolution and Learned Implicit Representation
- URL: http://arxiv.org/abs/2012.06178v1
- Date: Fri, 11 Dec 2020 08:07:39 GMT
- Title: Detailed 3D Human Body Reconstruction from Multi-view Images Combining
Voxel Super-Resolution and Learned Implicit Representation
- Authors: Zhongguo Li, Magnus Oskarsson, Anders Heyden
- Abstract summary: We propose a coarse-to-fine method to reconstruct a detailed 3D human body from multi-view images.
The coarse 3D models are estimated by learning an implicit representation based on multi-scale features.
The refined detailed 3D human body models can be produced by the voxel super-resolution which can preserve the details.
- Score: 12.459968574683625
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of reconstructing detailed 3D human body models from images is
interesting but challenging in computer vision due to the high freedom of human
bodies. In order to tackle the problem, we propose a coarse-to-fine method to
reconstruct a detailed 3D human body from multi-view images combining voxel
super-resolution based on learning the implicit representation. Firstly, the
coarse 3D models are estimated by learning an implicit representation based on
multi-scale features which are extracted by multi-stage hourglass networks from
the multi-view images. Then, taking the low resolution voxel grids which are
generated by the coarse 3D models as input, the voxel super-resolution based on
an implicit representation is learned through a multi-stage 3D convolutional
neural network. Finally, the refined detailed 3D human body models can be
produced by the voxel super-resolution which can preserve the details and
reduce the false reconstruction of the coarse 3D models. Benefiting from the
implicit representation, the training process in our method is memory efficient
and the detailed 3D human body produced by our method from multi-view images is
the continuous decision boundary with high-resolution geometry. In addition,
the coarse-to-fine method based on voxel super-resolution can remove false
reconstructions and preserve the appearance details in the final
reconstruction, simultaneously. In the experiments, our method quantitatively
and qualitatively achieves the competitive 3D human body reconstructions from
images with various poses and shapes on both the real and synthetic datasets.
Related papers
- COSMU: Complete 3D human shape from monocular unconstrained images [24.08612483445495]
We present a novel framework to reconstruct complete 3D human shapes from a given target image by leveraging monocular unconstrained images.
The objective of this work is to reproduce high-quality details in regions of the reconstructed human body that are not visible in the input target.
arXiv Detail & Related papers (2024-07-15T10:06:59Z) - HR Human: Modeling Human Avatars with Triangular Mesh and High-Resolution Textures from Videos [52.23323966700072]
We present a framework for acquiring human avatars that are attached with high-resolution physically-based material textures and mesh from monocular video.
Our method introduces a novel information fusion strategy to combine the information from the monocular video and synthesize virtual multi-view images.
Experiments show that our approach outperforms previous representations in terms of high fidelity, and this explicit result supports deployment on common triangulars.
arXiv Detail & Related papers (2024-05-18T11:49:09Z) - SemanticHuman-HD: High-Resolution Semantic Disentangled 3D Human Generation [12.063815354055052]
We introduce SemanticHuman-HD, the first method to achieve semantic disentangled human image synthesis.
SemanticHuman-HD is also the first method to achieve 3D-aware image synthesis at $10242$ resolution.
Our method opens up exciting possibilities for various applications, including 3D garment generation, semantic-aware image synthesis, controllable image synthesis.
arXiv Detail & Related papers (2024-03-15T10:18:56Z) - What You See is What You GAN: Rendering Every Pixel for High-Fidelity
Geometry in 3D GANs [82.3936309001633]
3D-aware Generative Adversarial Networks (GANs) have shown remarkable progress in learning to generate multi-view-consistent images and 3D geometries.
Yet, the significant memory and computational costs of dense sampling in volume rendering have forced 3D GANs to adopt patch-based training or employ low-resolution rendering with post-processing 2D super resolution.
We propose techniques to scale neural volume rendering to the much higher resolution of native 2D images, thereby resolving fine-grained 3D geometry with unprecedented detail.
arXiv Detail & Related papers (2024-01-04T18:50:38Z) - High-fidelity 3D Human Digitization from Single 2K Resolution Images [16.29087820634057]
We propose 2K2K, which constructs a large-scale 2K human dataset and infers 3D human models from 2K resolution images.
We also provide 2,050 3D human models, including texture maps, 3D joints, and SMPL parameters for research purposes.
arXiv Detail & Related papers (2023-03-27T11:22:54Z) - DRaCoN -- Differentiable Rasterization Conditioned Neural Radiance
Fields for Articulated Avatars [92.37436369781692]
We present DRaCoN, a framework for learning full-body volumetric avatars.
It exploits the advantages of both the 2D and 3D neural rendering techniques.
Experiments on the challenging ZJU-MoCap and Human3.6M datasets indicate that DRaCoN outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-03-29T17:59:15Z) - 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos [107.36352212367179]
We propose RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme.
The proposed method is able to learn 3D body pose and shape across different resolutions with one single model.
We extend the RSC-Net to handle low-resolution videos and apply it to reconstruct textured 3D pedestrians from low-resolution input.
arXiv Detail & Related papers (2021-03-11T06:52:12Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z) - 3D Human Shape and Pose from a Single Low-Resolution Image with
Self-Supervised Learning [105.49950571267715]
Existing deep learning methods for 3D human shape and pose estimation rely on relatively high-resolution input images.
We propose RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme.
We show that both these new training losses provide robustness when learning 3D shape and pose in a weakly-supervised manner.
arXiv Detail & Related papers (2020-07-27T16:19:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.