Related papers: PaintHuman: Towards High-fidelity Text-to-3D Human Texturing via Denoised Score Distillation

PaintHuman: Towards High-fidelity Text-to-3D Human Texturing via Denoised Score Distillation

URL: http://arxiv.org/abs/2310.09458v1
Date: Sat, 14 Oct 2023 00:37:16 GMT
Title: PaintHuman: Towards High-fidelity Text-to-3D Human Texturing via Denoised Score Distillation
Authors: Jianhui Yu, Hao Zhu, Liming Jiang, Chen Change Loy, Weidong Cai, Wayne Wu
Abstract summary: Recent advances in text-to-3D human generation have been groundbreaking. We propose a model called PaintHuman to address the challenges from two aspects. We use the depth map as a guidance to ensure realistic semantically aligned textures.
Score: 89.09455618184239
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in zero-shot text-to-3D human generation, which employ the human model prior (eg, SMPL) or Score Distillation Sampling (SDS) with pre-trained text-to-image diffusion models, have been groundbreaking. However, SDS may provide inaccurate gradient directions under the weak diffusion guidance, as it tends to produce over-smoothed results and generate body textures that are inconsistent with the detailed mesh geometry. Therefore, directly leverage existing strategies for high-fidelity text-to-3D human texturing is challenging. In this work, we propose a model called PaintHuman to addresses the challenges from two aspects. We first propose a novel score function, Denoised Score Distillation (DSD), which directly modifies the SDS by introducing negative gradient components to iteratively correct the gradient direction and generate high-quality textures. In addition, we use the depth map as a geometric guidance to ensure the texture is semantically aligned to human mesh surfaces. To guarantee the quality of rendered results, we employ geometry-aware networks to predict surface materials and render realistic human textures. Extensive experiments, benchmarked against state-of-the-art methods, validate the efficacy of our approach.

Related papers

SMPL-GPTexture: Dual-View 3D Human Texture Estimation using Text-to-Image Generation Models [7.436391283592317]
SMPL-GPTexture is a novel pipeline that takes natural language prompts as input and leverages a state-of-the-art text-to-image generation model. We show that our pipeline can generate high resolution texture aligned with user's prompts.
arXiv Detail & Related papers (2025-04-17T23:28:38Z)
DreamPolish: Domain Score Distillation With Progressive Geometry Generation [66.94803919328815]
We introduce DreamPolish, a text-to-3D generation model that excels in producing refined geometry and high-quality textures. In the geometry construction phase, our approach leverages multiple neural representations to enhance the stability of the synthesis process. In the texture generation phase, we introduce a novel score distillation objective, namely domain score distillation (DSD), to guide neural representations toward such a domain.
arXiv Detail & Related papers (2024-11-03T15:15:01Z)
FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis [51.193297565630886]
The challenge of accurately inferring texture remains, particularly in obscured areas such as the back of a person in frontal-view images. This limitation in texture prediction largely stems from the scarcity of large-scale and diverse 3D datasets. We propose leveraging extensive 2D fashion datasets to enhance both texture and shape prediction in 3D human digitization.
arXiv Detail & Related papers (2024-10-13T01:25:05Z)
Semantic Score Distillation Sampling for Compositional Text-to-3D Generation [28.88237230872795]
Generating high-quality 3D assets from textual descriptions remains a pivotal challenge in computer graphics and vision research. We introduce a novel SDS approach, designed to improve the expressiveness and accuracy of compositional text-to-3D generation. Our approach integrates new semantic embeddings that maintain consistency across different rendering views. By leveraging explicit semantic guidance, our method unlocks the compositional capabilities of existing pre-trained diffusion models.
arXiv Detail & Related papers (2024-10-11T17:26:00Z)
Text-Driven Diverse Facial Texture Generation via Progressive Latent-Space Refinement [34.00893761125383]
We propose a progressive latent space refinement approach to bootstrap from 3D Morphable Models (3DMMs)-based texture maps generated from facial images. Our method outperforms existing 3D texture generation methods regarding photo-realistic quality, diversity, and efficiency.
arXiv Detail & Related papers (2024-04-15T08:04:44Z)
StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D [88.66678730537777]
We present StableDreamer, a methodology incorporating three advances. First, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss. Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition.
arXiv Detail & Related papers (2023-12-02T02:27:58Z)
ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis [49.28239918969784]
We introduce a texture-consistent back view synthesis module that could transfer the reference image content to the back view. We also propose a visibility-aware patch consistency regularization for texture mapping and refinement combined with the synthesized back view texture.
arXiv Detail & Related papers (2023-11-28T13:55:53Z)
EucliDreamer: Fast and High-Quality Texturing for 3D Models with Stable Diffusion Depth [5.158983929861116]
We present a novel method to generate textures for 3D models given text prompts and 3D meshes. Additional depth information is taken into account to perform the Score Distillation Sampling (SDS) process.
arXiv Detail & Related papers (2023-11-27T06:55:53Z)
HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation [41.82589219009301]
We propose HumanNorm, a novel approach for high-quality and realistic 3D human generation. The main idea is to enhance the model's 2D perception of 3D geometry by learning a normal-adapted diffusion model and a normal-aligned diffusion model. HumanNorm outperforms existing text-to-3D methods in both geometry and texture quality.
arXiv Detail & Related papers (2023-10-02T17:59:17Z)
3D Dense Geometry-Guided Facial Expression Synthesis by Adversarial Learning [54.24887282693925]
We propose a novel framework to exploit 3D dense (depth and surface normals) information for expression manipulation. We use an off-the-shelf state-of-the-art 3D reconstruction model to estimate the depth and create a large-scale RGB-Depth dataset. Our experiments demonstrate that the proposed method outperforms the competitive baseline and existing arts by a large margin.
arXiv Detail & Related papers (2020-09-30T17:12:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.