PaintHuman: Towards High-fidelity Text-to-3D Human Texturing via
Denoised Score Distillation
- URL: http://arxiv.org/abs/2310.09458v1
- Date: Sat, 14 Oct 2023 00:37:16 GMT
- Title: PaintHuman: Towards High-fidelity Text-to-3D Human Texturing via
Denoised Score Distillation
- Authors: Jianhui Yu, Hao Zhu, Liming Jiang, Chen Change Loy, Weidong Cai, Wayne
Wu
- Abstract summary: Recent advances in text-to-3D human generation have been groundbreaking.
We propose a model called PaintHuman to address the challenges from two aspects.
We use the depth map as a guidance to ensure realistic semantically aligned textures.
- Score: 89.09455618184239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in zero-shot text-to-3D human generation, which employ the
human model prior (eg, SMPL) or Score Distillation Sampling (SDS) with
pre-trained text-to-image diffusion models, have been groundbreaking. However,
SDS may provide inaccurate gradient directions under the weak diffusion
guidance, as it tends to produce over-smoothed results and generate body
textures that are inconsistent with the detailed mesh geometry. Therefore,
directly leverage existing strategies for high-fidelity text-to-3D human
texturing is challenging. In this work, we propose a model called PaintHuman to
addresses the challenges from two aspects. We first propose a novel score
function, Denoised Score Distillation (DSD), which directly modifies the SDS by
introducing negative gradient components to iteratively correct the gradient
direction and generate high-quality textures. In addition, we use the depth map
as a geometric guidance to ensure the texture is semantically aligned to human
mesh surfaces. To guarantee the quality of rendered results, we employ
geometry-aware networks to predict surface materials and render realistic human
textures. Extensive experiments, benchmarked against state-of-the-art methods,
validate the efficacy of our approach.
Related papers
- DreamPolish: Domain Score Distillation With Progressive Geometry Generation [66.94803919328815]
We introduce DreamPolish, a text-to-3D generation model that excels in producing refined geometry and high-quality textures.
In the geometry construction phase, our approach leverages multiple neural representations to enhance the stability of the synthesis process.
In the texture generation phase, we introduce a novel score distillation objective, namely domain score distillation (DSD), to guide neural representations toward such a domain.
arXiv Detail & Related papers (2024-11-03T15:15:01Z) - FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis [51.193297565630886]
The challenge of accurately inferring texture remains, particularly in obscured areas such as the back of a person in frontal-view images.
This limitation in texture prediction largely stems from the scarcity of large-scale and diverse 3D datasets.
We propose leveraging extensive 2D fashion datasets to enhance both texture and shape prediction in 3D human digitization.
arXiv Detail & Related papers (2024-10-13T01:25:05Z) - Semantic Score Distillation Sampling for Compositional Text-to-3D Generation [28.88237230872795]
Generating high-quality 3D assets from textual descriptions remains a pivotal challenge in computer graphics and vision research.
We introduce a novel SDS approach, designed to improve the expressiveness and accuracy of compositional text-to-3D generation.
Our approach integrates new semantic embeddings that maintain consistency across different rendering views.
By leveraging explicit semantic guidance, our method unlocks the compositional capabilities of existing pre-trained diffusion models.
arXiv Detail & Related papers (2024-10-11T17:26:00Z) - Text-Driven Diverse Facial Texture Generation via Progressive Latent-Space Refinement [34.00893761125383]
We propose a progressive latent space refinement approach to bootstrap from 3D Morphable Models (3DMMs)-based texture maps generated from facial images.
Our method outperforms existing 3D texture generation methods regarding photo-realistic quality, diversity, and efficiency.
arXiv Detail & Related papers (2024-04-15T08:04:44Z) - StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D [88.66678730537777]
We present StableDreamer, a methodology incorporating three advances.
First, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss.
Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition.
arXiv Detail & Related papers (2023-12-02T02:27:58Z) - ConTex-Human: Free-View Rendering of Human from a Single Image with
Texture-Consistent Synthesis [49.28239918969784]
We introduce a texture-consistent back view synthesis module that could transfer the reference image content to the back view.
We also propose a visibility-aware patch consistency regularization for texture mapping and refinement combined with the synthesized back view texture.
arXiv Detail & Related papers (2023-11-28T13:55:53Z) - EucliDreamer: Fast and High-Quality Texturing for 3D Models with Stable Diffusion Depth [5.158983929861116]
We present a novel method to generate textures for 3D models given text prompts and 3D meshes.
Additional depth information is taken into account to perform the Score Distillation Sampling (SDS) process.
arXiv Detail & Related papers (2023-11-27T06:55:53Z) - HumanNorm: Learning Normal Diffusion Model for High-quality and
Realistic 3D Human Generation [41.82589219009301]
We propose HumanNorm, a novel approach for high-quality and realistic 3D human generation.
The main idea is to enhance the model's 2D perception of 3D geometry by learning a normal-adapted diffusion model and a normal-aligned diffusion model.
HumanNorm outperforms existing text-to-3D methods in both geometry and texture quality.
arXiv Detail & Related papers (2023-10-02T17:59:17Z) - 3D Dense Geometry-Guided Facial Expression Synthesis by Adversarial
Learning [54.24887282693925]
We propose a novel framework to exploit 3D dense (depth and surface normals) information for expression manipulation.
We use an off-the-shelf state-of-the-art 3D reconstruction model to estimate the depth and create a large-scale RGB-Depth dataset.
Our experiments demonstrate that the proposed method outperforms the competitive baseline and existing arts by a large margin.
arXiv Detail & Related papers (2020-09-30T17:12:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.