Template-Free Single-View 3D Human Digitalization with Diffusion-Guided LRM
- URL: http://arxiv.org/abs/2401.12175v2
- Date: Thu, 14 Mar 2024 08:12:46 GMT
- Title: Template-Free Single-View 3D Human Digitalization with Diffusion-Guided LRM
- Authors: Zhenzhen Weng, Jingyuan Liu, Hao Tan, Zhan Xu, Yang Zhou, Serena Yeung-Levy, Jimei Yang,
- Abstract summary: We present Human-LRM, a diffusion-guided feed-forward model that predicts the implicit field of a human from a single image.
Our method is able to capture human without any template prior, e.g., SMPL, and effectively enhance occluded parts with rich and realistic details.
- Score: 29.13412037370585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing 3D humans from a single image has been extensively investigated. However, existing approaches often fall short on capturing fine geometry and appearance details, hallucinating occluded parts with plausible details, and achieving generalization across unseen and in-the-wild datasets. We present Human-LRM, a diffusion-guided feed-forward model that predicts the implicit field of a human from a single image. Leveraging the power of the state-of-the-art reconstruction model (i.e., LRM) and generative model (i.e Stable Diffusion), our method is able to capture human without any template prior, e.g., SMPL, and effectively enhance occluded parts with rich and realistic details. Our approach first uses a single-view LRM model with an enhanced geometry decoder to get the triplane NeRF representation. The novel view renderings from the triplane NeRF provide strong geometry and color prior, from which we generate photo-realistic details for the occluded parts using a diffusion model. The generated multiple views then enable reconstruction with high-quality geometry and appearance, leading to superior overall performance comparing to all existing human reconstruction methods.
Related papers
- Generalizable Human Gaussians from Single-View Image [54.712838657788566]
We propose single-view generalizable Human Gaussian model (HGM), a diffusion-guided framework for 3D human modeling from a single image.
Although effective in hallucinating the unobserved views, the approach may generate unrealistic human pose and shapes due to the lack of supervision.
We validate our approach on publicly available datasets and demonstrate that it significantly surpasses state-of-the-art methods in terms of PSNR and SSIM.
arXiv Detail & Related papers (2024-06-10T06:38:11Z) - CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction
Model [37.75256020559125]
We present a high-fidelity feed-forward single image-to-3D generative model.
We highlight the necessity of integrating geometric priors into network design.
Our model delivers a high-fidelity textured mesh from an image in just 10 seconds, without any test-time optimization.
arXiv Detail & Related papers (2024-03-08T04:25:29Z) - Deceptive-Human: Prompt-to-NeRF 3D Human Generation with 3D-Consistent
Synthetic Images [67.31920821192323]
Deceptive-Human is a novel framework capitalizing state-of-the-art control diffusion models (e.g., ControlNet) to generate a high-quality controllable 3D human NeRF.
Our method is versatile and readily accommodating, including a text prompt and additional data such as 3D mesh, poses, and seed images.
The resulting 3D human NeRF model empowers the synthesis of highly photorealistic views from 360-degree perspectives.
arXiv Detail & Related papers (2023-11-27T15:49:41Z) - Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and
Reconstruction [77.69363640021503]
3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images.
We present SSDNeRF, a unified approach that employs an expressive diffusion model to learn a generalizable prior of neural radiance fields (NeRF) from multi-view images of diverse objects.
arXiv Detail & Related papers (2023-04-13T17:59:01Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z) - NeRDi: Single-View NeRF Synthesis with Language-Guided Diffusion as
General Image Priors [24.05480789681139]
We propose NeRDi, a single-view NeRF synthesis framework with general image priors from 2D diffusion models.
We leverage off-the-shelf vision-language models and introduce a two-section language guidance as conditioning inputs to the diffusion model.
We also demonstrate our generalizability in zero-shot NeRF synthesis for in-the-wild images.
arXiv Detail & Related papers (2022-12-06T19:00:07Z) - NeuralReshaper: Single-image Human-body Retouching with Deep Neural
Networks [50.40798258968408]
We present NeuralReshaper, a novel method for semantic reshaping of human bodies in single images using deep generative networks.
Our approach follows a fit-then-reshape pipeline, which first fits a parametric 3D human model to a source human image.
To deal with the lack-of-data problem that no paired data exist, we introduce a novel self-supervised strategy to train our network.
arXiv Detail & Related papers (2022-03-20T09:02:13Z) - 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous
Image Data [77.57798334776353]
We consider the problem of obtaining dense 3D reconstructions of humans from single and partially occluded views.
We suggest that ambiguities can be modelled more effectively by parametrizing the possible body shapes and poses.
We show that our method outperforms alternative approaches in ambiguous pose recovery on standard benchmarks for 3D humans.
arXiv Detail & Related papers (2020-11-02T13:55:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.