HumanNorm: Learning Normal Diffusion Model for High-quality and
Realistic 3D Human Generation
- URL: http://arxiv.org/abs/2310.01406v2
- Date: Wed, 29 Nov 2023 16:23:33 GMT
- Title: HumanNorm: Learning Normal Diffusion Model for High-quality and
Realistic 3D Human Generation
- Authors: Xin Huang, Ruizhi Shao, Qi Zhang, Hongwen Zhang, Ying Feng, Yebin Liu,
Qing Wang
- Abstract summary: We propose HumanNorm, a novel approach for high-quality and realistic 3D human generation.
The main idea is to enhance the model's 2D perception of 3D geometry by learning a normal-adapted diffusion model and a normal-aligned diffusion model.
HumanNorm outperforms existing text-to-3D methods in both geometry and texture quality.
- Score: 41.82589219009301
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent text-to-3D methods employing diffusion models have made significant
advancements in 3D human generation. However, these approaches face challenges
due to the limitations of text-to-image diffusion models, which lack an
understanding of 3D structures. Consequently, these methods struggle to achieve
high-quality human generation, resulting in smooth geometry and cartoon-like
appearances. In this paper, we propose HumanNorm, a novel approach for
high-quality and realistic 3D human generation. The main idea is to enhance the
model's 2D perception of 3D geometry by learning a normal-adapted diffusion
model and a normal-aligned diffusion model. The normal-adapted diffusion model
can generate high-fidelity normal maps corresponding to user prompts with
view-dependent and body-aware text. The normal-aligned diffusion model learns
to generate color images aligned with the normal maps, thereby transforming
physical geometry details into realistic appearance. Leveraging the proposed
normal diffusion model, we devise a progressive geometry generation strategy
and a multi-step Score Distillation Sampling (SDS) loss to enhance the
performance of 3D human generation. Comprehensive experiments substantiate
HumanNorm's ability to generate 3D humans with intricate geometry and realistic
appearances. HumanNorm outperforms existing text-to-3D methods in both geometry
and texture quality. The project page of HumanNorm is
https://humannorm.github.io/.
Related papers
- CAD: Photorealistic 3D Generation via Adversarial Distillation [28.07049413820128]
We propose a novel learning paradigm for 3D synthesis that utilizes pre-trained diffusion models.
Our method unlocks the generation of high-fidelity and photorealistic 3D content conditioned on a single image and prompt.
arXiv Detail & Related papers (2023-12-11T18:59:58Z) - HumanRef: Single Image to 3D Human Generation via Reference-Guided
Diffusion [53.1558345421646]
We propose HumanRef, a 3D human generation framework from a single-view input.
To ensure the generated 3D model is photorealistic and consistent with the input image, HumanRef introduces a novel method called reference-guided score distillation sampling.
Experimental results demonstrate that HumanRef outperforms state-of-the-art methods in generating 3D clothed humans.
arXiv Detail & Related papers (2023-11-28T17:06:28Z) - RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail
Richness in Text-to-3D [31.77212284992657]
We learn a generalizable Normal-Depth diffusion model for 3D generation.
We introduce an albedo diffusion model to impose data-driven constraints on the albedo component.
Our experiments show that when integrated into existing text-to-3D pipelines, our models significantly enhance the richness.
arXiv Detail & Related papers (2023-11-28T16:22:33Z) - PaintHuman: Towards High-fidelity Text-to-3D Human Texturing via
Denoised Score Distillation [89.09455618184239]
Recent advances in text-to-3D human generation have been groundbreaking.
We propose a model called PaintHuman to address the challenges from two aspects.
We use the depth map as a guidance to ensure realistic semantically aligned textures.
arXiv Detail & Related papers (2023-10-14T00:37:16Z) - EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior [59.25950280610409]
We propose a robust high-quality 3D content generation pipeline by exploiting orthogonal-view image guidance.
In this paper, we introduce a novel 2D diffusion model that generates an image consisting of four sub-images based on the given text prompt.
We also present a 3D synthesis network that can further improve the details of the generated 3D contents.
arXiv Detail & Related papers (2023-08-25T07:39:26Z) - HumanLiff: Layer-wise 3D Human Generation with Diffusion Model [55.891036415316876]
Existing 3D human generative models mainly generate a clothed 3D human as an undetectable 3D model in a single pass.
We propose HumanLiff, the first layer-wise 3D human generative model with a unified diffusion process.
arXiv Detail & Related papers (2023-08-18T17:59:04Z) - ZeroAvatar: Zero-shot 3D Avatar Generation from a Single Image [17.285152757066527]
We present ZeroAvatar, a method that introduces the explicit 3D human body prior to the optimization process.
We show that ZeroAvatar significantly enhances the robustness and 3D consistency of optimization-based image-to-3D avatar generation.
arXiv Detail & Related papers (2023-05-25T18:23:20Z) - DreamFusion: Text-to-3D using 2D Diffusion [52.52529213936283]
Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text pairs.
In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis.
Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.
arXiv Detail & Related papers (2022-09-29T17:50:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.