Deceptive-Human: Prompt-to-NeRF 3D Human Generation with 3D-Consistent
Synthetic Images
- URL: http://arxiv.org/abs/2311.16499v1
- Date: Mon, 27 Nov 2023 15:49:41 GMT
- Title: Deceptive-Human: Prompt-to-NeRF 3D Human Generation with 3D-Consistent
Synthetic Images
- Authors: Shiu-hong Kao, Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang
- Abstract summary: Deceptive-Human is a novel framework capitalizing state-of-the-art control diffusion models (e.g., ControlNet) to generate a high-quality controllable 3D human NeRF.
Our method is versatile and readily accommodating, including a text prompt and additional data such as 3D mesh, poses, and seed images.
The resulting 3D human NeRF model empowers the synthesis of highly photorealistic views from 360-degree perspectives.
- Score: 67.31920821192323
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents Deceptive-Human, a novel Prompt-to-NeRF framework
capitalizing state-of-the-art control diffusion models (e.g., ControlNet) to
generate a high-quality controllable 3D human NeRF. Different from direct 3D
generative approaches, e.g., DreamFusion and DreamHuman, Deceptive-Human
employs a progressive refinement technique to elevate the reconstruction
quality. This is achieved by utilizing high-quality synthetic human images
generated through the ControlNet with view-consistent loss. Our method is
versatile and readily extensible, accommodating multimodal inputs, including a
text prompt and additional data such as 3D mesh, poses, and seed images. The
resulting 3D human NeRF model empowers the synthesis of highly photorealistic
novel views from 360-degree perspectives. The key to our Deceptive-Human for
hallucinating multi-view consistent synthetic human images lies in our
progressive finetuning strategy. This strategy involves iteratively enhancing
views using the provided multimodal inputs at each intermediate step to improve
the human NeRF model. Within this iterative refinement process, view-dependent
appearances are systematically eliminated to prevent interference with the
underlying density estimation. Extensive qualitative and quantitative
experimental comparison shows that our deceptive human models achieve
state-of-the-art application quality.
Related papers
- SemanticHuman-HD: High-Resolution Semantic Disentangled 3D Human Generation [12.063815354055052]
We introduce SemanticHuman-HD, the first method to achieve semantic disentangled human image synthesis.
SemanticHuman-HD is also the first method to achieve 3D-aware image synthesis at $10242$ resolution.
Our method opens up exciting possibilities for various applications, including 3D garment generation, semantic-aware image synthesis, controllable image synthesis.
arXiv Detail & Related papers (2024-03-15T10:18:56Z) - Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation [14.064983137553353]
We aim to enhance the quality and functionality of generative diffusion models for the task of creating controllable, photorealistic human avatars.
We achieve this by integrating a 3D morphable model into the state-of-the-art multi-view-consistent diffusion approach.
Our proposed framework is the first diffusion model to enable the creation of fully 3D-consistent, animatable, and photorealistic human avatars.
arXiv Detail & Related papers (2024-01-09T18:59:04Z) - MVHuman: Tailoring 2D Diffusion with Multi-view Sampling For Realistic
3D Human Generation [45.88714821939144]
We present an alternative scheme named MVHuman to generate human radiance fields from text guidance.
Our core is a multi-view sampling strategy to tailor the denoising processes of the pre-trained network for generating consistent multi-view images.
arXiv Detail & Related papers (2023-12-15T11:56:26Z) - HumanRef: Single Image to 3D Human Generation via Reference-Guided
Diffusion [53.1558345421646]
We propose HumanRef, a 3D human generation framework from a single-view input.
To ensure the generated 3D model is photorealistic and consistent with the input image, HumanRef introduces a novel method called reference-guided score distillation sampling.
Experimental results demonstrate that HumanRef outperforms state-of-the-art methods in generating 3D clothed humans.
arXiv Detail & Related papers (2023-11-28T17:06:28Z) - HumanNorm: Learning Normal Diffusion Model for High-quality and
Realistic 3D Human Generation [41.82589219009301]
We propose HumanNorm, a novel approach for high-quality and realistic 3D human generation.
The main idea is to enhance the model's 2D perception of 3D geometry by learning a normal-adapted diffusion model and a normal-aligned diffusion model.
HumanNorm outperforms existing text-to-3D methods in both geometry and texture quality.
arXiv Detail & Related papers (2023-10-02T17:59:17Z) - Refining 3D Human Texture Estimation from a Single Image [3.8761064607384195]
Estimating 3D human texture from a single image is essential in graphics and vision.
We propose a framework that adaptively samples the input by a deformable convolution where offsets are learned via a deep neural network.
arXiv Detail & Related papers (2023-03-06T19:53:50Z) - Get3DHuman: Lifting StyleGAN-Human into a 3D Generative Model using
Pixel-aligned Reconstruction Priors [56.192682114114724]
Get3DHuman is a novel 3D human framework that can significantly boost the realism and diversity of the generated outcomes.
Our key observation is that the 3D generator can profit from human-related priors learned through 2D human generators and 3D reconstructors.
arXiv Detail & Related papers (2023-02-02T15:37:46Z) - HDHumans: A Hybrid Approach for High-fidelity Digital Humans [107.19426606778808]
HDHumans is the first method for HD human character synthesis that jointly produces an accurate and temporally coherent 3D deforming surface.
Our method is carefully designed to achieve a synergy between classical surface deformation and neural radiance fields (NeRF)
arXiv Detail & Related papers (2022-10-21T14:42:11Z) - THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers [67.8628917474705]
THUNDR is a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people.
We show state-of-the-art results on Human3.6M and 3DPW, for both the fully-supervised and the self-supervised models.
We observe very solid 3d reconstruction performance for difficult human poses collected in the wild.
arXiv Detail & Related papers (2021-06-17T09:09:24Z) - Neural Descent for Visual 3D Human Pose and Shape [67.01050349629053]
We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image.
We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end.
Central to our methodology, is a learning to learn and optimize approach, referred to as HUmanNeural Descent (HUND), which avoids both second-order differentiation.
arXiv Detail & Related papers (2020-08-16T13:38:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.