GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior
- URL: http://arxiv.org/abs/2503.11143v1
- Date: Fri, 14 Mar 2025 07:16:43 GMT
- Title: GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior
- Authors: Zichen Tang, Yuan Yao, Miaomiao Cui, Liefeng Bo, Hongyu Yang,
- Abstract summary: We propose a two-stage framework for generating identity-preserving realistic 3D humans from text and image prompts.<n>Our core insight is to leverage human-centric knowledge to facilitate the generation process.<n>Experiments demonstrate that GaussianIP outperforms existing methods in both visual quality and training efficiency.
- Score: 25.72805054203982
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-guided 3D human generation has advanced with the development of efficient 3D representations and 2D-lifting methods like Score Distillation Sampling (SDS). However, current methods suffer from prolonged training times and often produce results that lack fine facial and garment details. In this paper, we propose GaussianIP, an effective two-stage framework for generating identity-preserving realistic 3D humans from text and image prompts. Our core insight is to leverage human-centric knowledge to facilitate the generation process. In stage 1, we propose a novel Adaptive Human Distillation Sampling (AHDS) method to rapidly generate a 3D human that maintains high identity consistency with the image prompt and achieves a realistic appearance. Compared to traditional SDS methods, AHDS better aligns with the human-centric generation process, enhancing visual quality with notably fewer training steps. To further improve the visual quality of the face and clothes regions, we design a View-Consistent Refinement (VCR) strategy in stage 2. Specifically, it produces detail-enhanced results of the multi-view images from stage 1 iteratively, ensuring the 3D texture consistency across views via mutual attention and distance-guided attention fusion. Then a polished version of the 3D human can be achieved by directly perform reconstruction with the refined images. Extensive experiments demonstrate that GaussianIP outperforms existing methods in both visual quality and training efficiency, particularly in generating identity-preserving results. Our code is available at: https://github.com/silence-tang/GaussianIP.
Related papers
- Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance [69.9745497000557]
We introduce Arc2Avatar, the first SDS-based method utilizing a human face foundation model as guidance with just a single image as input.
Our avatars maintain a dense correspondence with a human face mesh template, allowing blendshape-based expression generation.
arXiv Detail & Related papers (2025-01-09T17:04:33Z) - GECO: Generative Image-to-3D within a SECOnd [51.20830808525894]
We introduce GECO, a novel method for high-quality 3D generative modeling that operates within a second.
GECO achieves high-quality image-to-3D mesh generation with an unprecedented level of efficiency.
arXiv Detail & Related papers (2024-05-30T17:58:00Z) - ID-to-3D: Expressive ID-guided 3D Heads via Score Distillation Sampling [96.87575334960258]
ID-to-3D is a method to generate identity- and text-guided 3D human heads with disentangled expressions.
Results achieve an unprecedented level of identity-consistent and high-quality texture and geometry generation.
arXiv Detail & Related papers (2024-05-26T13:36:45Z) - MVHuman: Tailoring 2D Diffusion with Multi-view Sampling For Realistic
3D Human Generation [45.88714821939144]
We present an alternative scheme named MVHuman to generate human radiance fields from text guidance.
Our core is a multi-view sampling strategy to tailor the denoising processes of the pre-trained network for generating consistent multi-view images.
arXiv Detail & Related papers (2023-12-15T11:56:26Z) - HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting [113.37908093915837]
Existing methods optimize 3D representations like mesh or neural fields via score distillation sampling (SDS), which suffers from inadequate fine details or excessive training time.
In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance.
arXiv Detail & Related papers (2023-11-28T18:59:58Z) - HumanRef: Single Image to 3D Human Generation via Reference-Guided
Diffusion [53.1558345421646]
We propose HumanRef, a 3D human generation framework from a single-view input.
To ensure the generated 3D model is photorealistic and consistent with the input image, HumanRef introduces a novel method called reference-guided score distillation sampling.
Experimental results demonstrate that HumanRef outperforms state-of-the-art methods in generating 3D clothed humans.
arXiv Detail & Related papers (2023-11-28T17:06:28Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.