Related papers: PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion

PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion

URL: http://arxiv.org/abs/2409.10141v1
Date: Mon, 16 Sep 2024 10:13:06 GMT
Title: PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion
Authors: Peng Li, Wangguandong Zheng, Yuan Liu, Tao Yu, Yangguang Li, Xingqun Qi, Mengfei Li, Xiaowei Chi, Siyu Xia, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo,
Abstract summary: PSHuman is a novel framework that explicitly reconstructs human meshes utilizing priors from the multiview diffusion model. It is found that directly applying multiview diffusion on single-view human images leads to severe geometric distortions. To enhance cross-view body shape consistency of varied human poses, we condition the generative model on parametric models like SMPL-X.
Score: 43.850899288337025
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Detailed and photorealistic 3D human modeling is essential for various applications and has seen tremendous progress. However, full-body reconstruction from a monocular RGB image remains challenging due to the ill-posed nature of the problem and sophisticated clothing topology with self-occlusions. In this paper, we propose PSHuman, a novel framework that explicitly reconstructs human meshes utilizing priors from the multiview diffusion model. It is found that directly applying multiview diffusion on single-view human images leads to severe geometric distortions, especially on generated faces. To address it, we propose a cross-scale diffusion that models the joint probability distribution of global full-body shape and local facial characteristics, enabling detailed and identity-preserved novel-view generation without any geometric distortion. Moreover, to enhance cross-view body shape consistency of varied human poses, we condition the generative model on parametric models like SMPL-X, which provide body priors and prevent unnatural views inconsistent with human anatomy. Leveraging the generated multi-view normal and color images, we present SMPLX-initialized explicit human carving to recover realistic textured human meshes efficiently. Extensive experimental results and quantitative evaluations on CAPE and THuman2.1 datasets demonstrate PSHumans superiority in geometry details, texture fidelity, and generalization capability.

Related papers

DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior [82.9526308672547]
We present DPoser-X, a diffusion-based prior model for 3D whole-body human poses.<n>Our approach unifies various pose-centric tasks as inverse problems, solving them through variational diffusion sampling.<n>Our model consistently outperforms state-of-the-art alternatives, establishing a new benchmark for whole-body human pose prior modeling.
arXiv Detail & Related papers (2025-08-01T12:56:39Z)
HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration [29.03216532351979]
We introduce textbfHumanDreamer-X, a novel framework that integrates multi-view human generation and reconstruction into a unified pipeline. In this framework, 3D Gaussian Splatting serves as an explicit 3D representation to provide initial geometry and appearance priority. We also propose an attention modulation strategy that effectively enhances geometric details identity consistency across multi-view.
arXiv Detail & Related papers (2025-04-04T15:35:14Z)
GAS: Generative Avatar Synthesis from a Single Image [54.95198111659466]
We present a framework for synthesizing view-consistent and temporally coherent avatars from a single image.<n>Our approach combines the reconstruction power of regression-based 3D human reconstruction with the generative capabilities of a diffusion model.
arXiv Detail & Related papers (2025-02-10T19:00:39Z)
Human Multi-View Synthesis from a Single-View Model:Transferred Body and Face Representations [7.448124739584319]
We propose an innovative framework that leverages transferred body and facial representations for multi-view human synthesis. Specifically, we use a single-view model pretrained on a large-scale human dataset to develop a multi-view body representation. Our approach outperforms the current state-of-the-art methods, achieving superior performance in multi-view human synthesis.
arXiv Detail & Related papers (2024-12-04T04:02:17Z)
GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data [61.05815629606135]
Given a single in-the-wild human photo, it remains a challenging task to reconstruct a high-fidelity 3D human model. GeneMAN builds upon a comprehensive collection of high-quality human data. GeneMAN could generate high-quality 3D human models from a single image input, outperforming prior state-of-the-art methods.
arXiv Detail & Related papers (2024-11-27T18:59:54Z)
DiHuR: Diffusion-Guided Generalizable Human Reconstruction [51.31232435994026]
We introduce DiHuR, a Diffusion-guided model for generalizable Human 3D Reconstruction and view synthesis from sparse, minimally overlapping images. Our method integrates two key priors in a coherent manner: the prior from generalizable feed-forward models and the 2D diffusion prior, and it requires only multi-view image training, without 3D supervision.
arXiv Detail & Related papers (2024-11-16T03:52:23Z)
Single Image, Any Face: Generalisable 3D Face Generation [59.9369171926757]
We propose a novel model, Gen3D-Face, which generates 3D human faces with unconstrained single image input. To the best of our knowledge, this is the first attempt and benchmark for creating photorealistic 3D human face avatars from single images.
arXiv Detail & Related papers (2024-09-25T14:56:37Z)
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement [23.707586182294932]
Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge. We introduce MagicMan, a human-specific multi-view diffusion model designed to generate high-quality novel view images from a single reference image.
arXiv Detail & Related papers (2024-08-26T12:10:52Z)
HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors [47.62426718293504]
HumanSplat predicts the 3D Gaussian Splatting properties of any human from a single input image. HumanSplat surpasses existing state-of-the-art methods in achieving photorealistic novel-view synthesis.
arXiv Detail & Related papers (2024-06-18T10:05:33Z)
Generalizable Human Gaussians from Single-View Image [52.100234836129786]
We introduce a single-view generalizable Human Gaussian Model (HGM) Our approach uses a ControlNet to refine rendered back-view images from coarse predicted human Gaussians. To mitigate the potential generation of unrealistic human poses and shapes, we incorporate human priors from the SMPL-X model as a dual branch.
arXiv Detail & Related papers (2024-06-10T06:38:11Z)
Diffusion Models are Efficient Data Generators for Human Mesh Recovery [55.37787289869703]
We show that synthetic data created by generative models is complementary to CG-rendered data.<n>We propose an effective data generation pipeline based on recent diffusion models, termed HumanWild.<n>Our work could pave the way for scaling up 3D human recovery to in-the-wild scenes.
arXiv Detail & Related papers (2024-03-17T06:31:16Z)
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation [24.49857926071974]
Vanilla text-to-image diffusion models struggle with generating accurate human images. Existing methods address this issue mostly by fine-tuning the model with extra images or adding additional controls. This paper explores the integration of human-centric priors directly into the model fine-tuning stage.
arXiv Detail & Related papers (2024-03-08T11:59:32Z)
Template-Free Single-View 3D Human Digitalization with Diffusion-Guided LRM [29.13412037370585]
We present Human-LRM, a diffusion-guided feed-forward model that predicts the implicit field of a human from a single image. Our method is able to capture human without any template prior, e.g., SMPL, and effectively enhance occluded parts with rich and realistic details.
arXiv Detail & Related papers (2024-01-22T18:08:22Z)
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion [114.15397904945185]
We propose a unified framework, HyperHuman, that generates in-the-wild human images of high realism and diverse layouts. Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network. Our framework yields the state-of-the-art performance, generating hyper-realistic human images under diverse scenarios.
arXiv Detail & Related papers (2023-10-12T17:59:34Z)
GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images. Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.