HumanRef: Single Image to 3D Human Generation via Reference-Guided
Diffusion
- URL: http://arxiv.org/abs/2311.16961v1
- Date: Tue, 28 Nov 2023 17:06:28 GMT
- Title: HumanRef: Single Image to 3D Human Generation via Reference-Guided
Diffusion
- Authors: Jingbo Zhang, Xiaoyu Li, Qi Zhang, Yanpei Cao, Ying Shan, and Jing
Liao
- Abstract summary: We propose HumanRef, a 3D human generation framework from a single-view input.
To ensure the generated 3D model is photorealistic and consistent with the input image, HumanRef introduces a novel method called reference-guided score distillation sampling.
Experimental results demonstrate that HumanRef outperforms state-of-the-art methods in generating 3D clothed humans.
- Score: 53.1558345421646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating a 3D human model from a single reference image is challenging
because it requires inferring textures and geometries in invisible views while
maintaining consistency with the reference image. Previous methods utilizing 3D
generative models are limited by the availability of 3D training data.
Optimization-based methods that lift text-to-image diffusion models to 3D
generation often fail to preserve the texture details of the reference image,
resulting in inconsistent appearances in different views. In this paper, we
propose HumanRef, a 3D human generation framework from a single-view input. To
ensure the generated 3D model is photorealistic and consistent with the input
image, HumanRef introduces a novel method called reference-guided score
distillation sampling (Ref-SDS), which effectively incorporates image guidance
into the generation process. Furthermore, we introduce region-aware attention
to Ref-SDS, ensuring accurate correspondence between different body regions.
Experimental results demonstrate that HumanRef outperforms state-of-the-art
methods in generating 3D clothed humans with fine geometry, photorealistic
textures, and view-consistent appearances.
Related papers
- MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding [16.50466940644004]
We present Isotropic3D, an image-to-3D generation pipeline that takes only an image CLIP embedding as input.
Isotropic3D allows the optimization to be isotropic w.r.t. the azimuth angle by solely resting on the SDS loss.
arXiv Detail & Related papers (2024-03-15T15:27:58Z) - 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - MVHuman: Tailoring 2D Diffusion with Multi-view Sampling For Realistic
3D Human Generation [45.88714821939144]
We present an alternative scheme named MVHuman to generate human radiance fields from text guidance.
Our core is a multi-view sampling strategy to tailor the denoising processes of the pre-trained network for generating consistent multi-view images.
arXiv Detail & Related papers (2023-12-15T11:56:26Z) - Single-Image 3D Human Digitization with Shape-Guided Diffusion [31.99621159464388]
NeRF and its variants typically require videos or images from different viewpoints.
We present an approach to generate a 360-degree view of a person with a consistent, high-resolution appearance from a single input image.
arXiv Detail & Related papers (2023-11-15T18:59:56Z) - HumanNorm: Learning Normal Diffusion Model for High-quality and
Realistic 3D Human Generation [41.82589219009301]
We propose HumanNorm, a novel approach for high-quality and realistic 3D human generation.
The main idea is to enhance the model's 2D perception of 3D geometry by learning a normal-adapted diffusion model and a normal-aligned diffusion model.
HumanNorm outperforms existing text-to-3D methods in both geometry and texture quality.
arXiv Detail & Related papers (2023-10-02T17:59:17Z) - ZeroAvatar: Zero-shot 3D Avatar Generation from a Single Image [17.285152757066527]
We present ZeroAvatar, a method that introduces the explicit 3D human body prior to the optimization process.
We show that ZeroAvatar significantly enhances the robustness and 3D consistency of optimization-based image-to-3D avatar generation.
arXiv Detail & Related papers (2023-05-25T18:23:20Z) - NeuralReshaper: Single-image Human-body Retouching with Deep Neural
Networks [50.40798258968408]
We present NeuralReshaper, a novel method for semantic reshaping of human bodies in single images using deep generative networks.
Our approach follows a fit-then-reshape pipeline, which first fits a parametric 3D human model to a source human image.
To deal with the lack-of-data problem that no paired data exist, we introduce a novel self-supervised strategy to train our network.
arXiv Detail & Related papers (2022-03-20T09:02:13Z) - 3D-Aware Semantic-Guided Generative Model for Human Synthesis [67.86621343494998]
This paper proposes a 3D-aware Semantic-Guided Generative Model (3D-SGAN) for human image synthesis.
Our experiments on the DeepFashion dataset show that 3D-SGAN significantly outperforms the most recent baselines.
arXiv Detail & Related papers (2021-12-02T17:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.