PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion
- URL: http://arxiv.org/abs/2409.10141v1
- Date: Mon, 16 Sep 2024 10:13:06 GMT
- Title: PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion
- Authors: Peng Li, Wangguandong Zheng, Yuan Liu, Tao Yu, Yangguang Li, Xingqun Qi, Mengfei Li, Xiaowei Chi, Siyu Xia, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo,
- Abstract summary: PSHuman is a novel framework that explicitly reconstructs human meshes utilizing priors from the multiview diffusion model.
It is found that directly applying multiview diffusion on single-view human images leads to severe geometric distortions.
To enhance cross-view body shape consistency of varied human poses, we condition the generative model on parametric models like SMPL-X.
- Score: 43.850899288337025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detailed and photorealistic 3D human modeling is essential for various applications and has seen tremendous progress. However, full-body reconstruction from a monocular RGB image remains challenging due to the ill-posed nature of the problem and sophisticated clothing topology with self-occlusions. In this paper, we propose PSHuman, a novel framework that explicitly reconstructs human meshes utilizing priors from the multiview diffusion model. It is found that directly applying multiview diffusion on single-view human images leads to severe geometric distortions, especially on generated faces. To address it, we propose a cross-scale diffusion that models the joint probability distribution of global full-body shape and local facial characteristics, enabling detailed and identity-preserved novel-view generation without any geometric distortion. Moreover, to enhance cross-view body shape consistency of varied human poses, we condition the generative model on parametric models like SMPL-X, which provide body priors and prevent unnatural views inconsistent with human anatomy. Leveraging the generated multi-view normal and color images, we present SMPLX-initialized explicit human carving to recover realistic textured human meshes efficiently. Extensive experimental results and quantitative evaluations on CAPE and THuman2.1 datasets demonstrate PSHumans superiority in geometry details, texture fidelity, and generalization capability.
Related papers
- Single Image, Any Face: Generalisable 3D Face Generation [59.9369171926757]
We propose a novel model, Gen3D-Face, which generates 3D human faces with unconstrained single image input.
To the best of our knowledge, this is the first attempt and benchmark for creating photorealistic 3D human face avatars from single images.
arXiv Detail & Related papers (2024-09-25T14:56:37Z) - MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement [23.707586182294932]
Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge.
We introduce MagicMan, a human-specific multi-view diffusion model designed to generate high-quality novel view images from a single reference image.
arXiv Detail & Related papers (2024-08-26T12:10:52Z) - HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors [47.62426718293504]
HumanSplat predicts the 3D Gaussian Splatting properties of any human from a single input image.
HumanSplat surpasses existing state-of-the-art methods in achieving photorealistic novel-view synthesis.
arXiv Detail & Related papers (2024-06-18T10:05:33Z) - Generalizable Human Gaussians from Single-View Image [52.100234836129786]
We introduce a single-view generalizable Human Gaussian Model (HGM)
Our approach uses a ControlNet to refine rendered back-view images from coarse predicted human Gaussians.
To mitigate the potential generation of unrealistic human poses and shapes, we incorporate human priors from the SMPL-X model as a dual branch.
arXiv Detail & Related papers (2024-06-10T06:38:11Z) - Towards Effective Usage of Human-Centric Priors in Diffusion Models for
Text-based Human Image Generation [24.49857926071974]
Vanilla text-to-image diffusion models struggle with generating accurate human images.
Existing methods address this issue mostly by fine-tuning the model with extra images or adding additional controls.
This paper explores the integration of human-centric priors directly into the model fine-tuning stage.
arXiv Detail & Related papers (2024-03-08T11:59:32Z) - Template-Free Single-View 3D Human Digitalization with Diffusion-Guided LRM [29.13412037370585]
We present Human-LRM, a diffusion-guided feed-forward model that predicts the implicit field of a human from a single image.
Our method is able to capture human without any template prior, e.g., SMPL, and effectively enhance occluded parts with rich and realistic details.
arXiv Detail & Related papers (2024-01-22T18:08:22Z) - HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion [114.15397904945185]
We propose a unified framework, HyperHuman, that generates in-the-wild human images of high realism and diverse layouts.
Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network.
Our framework yields the state-of-the-art performance, generating hyper-realistic human images under diverse scenarios.
arXiv Detail & Related papers (2023-10-12T17:59:34Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.