Related papers: ID-Sculpt: ID-aware 3D Head Generation from Single In-the-wild Portrait Image

ID-Sculpt: ID-aware 3D Head Generation from Single In-the-wild Portrait Image

URL: http://arxiv.org/abs/2406.16710v3
Date: Sun, 22 Dec 2024 05:16:16 GMT
Title: ID-Sculpt: ID-aware 3D Head Generation from Single In-the-wild Portrait Image
Authors: Jinkun Hao, Junshu Tang, Jiangning Zhang, Ran Yi, Yijia Hong, Moran Li, Weijian Cao, Yating Wang, Chengjie Wang, Lizhuang Ma,
Abstract summary: Previous text-based methods for generating 3D heads were limited by text descriptions and image-based methods struggled to produce high-quality head geometry.<n>We propose a novel framework, ID-Sculpt, to generate high-quality 3D heads while preserving their identities.<n> Extensive experiments demonstrate that we can generate high-quality 3D heads with accurate geometry and texture from a single in-the-wild portrait image.
Score: 57.46195661521239
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While recent works have achieved great success on image-to-3D object generation, high quality and fidelity 3D head generation from a single image remains a great challenge. Previous text-based methods for generating 3D heads were limited by text descriptions and image-based methods struggled to produce high-quality head geometry. To handle this challenging problem, we propose a novel framework, ID-Sculpt, to generate high-quality 3D heads while preserving their identities. Our work incorporates the identity information of the portrait image into three parts: 1) geometry initialization, 2) geometry sculpting, and 3) texture generation stages. Given a reference portrait image, we first align the identity features with text features to realize ID-aware guidance enhancement, which contains the control signals representing the face information. We then use the canny map, ID features of the portrait image, and a pre-trained text-to-normal/depth diffusion model to generate ID-aware geometry supervision, and 3D-GAN inversion is employed to generate ID-aware geometry initialization. Furthermore, with the ability to inject identity information into 3D head generation, we use ID-aware guidance to calculate ID-aware Score Distillation (ISD) for geometry sculpting. For texture generation, we adopt the ID Consistent Texture Inpainting and Refinement which progressively expands the view for texture inpainting to obtain an initialization UV texture map. We then use the ID-aware guidance to provide image-level supervision for noisy multi-view images to obtain a refined texture map. Extensive experiments demonstrate that we can generate high-quality 3D heads with accurate geometry and texture from a single in-the-wild portrait image.

Related papers

Constructing a 3D Town from a Single Image [23.231661811526955]
3DTown is a training-free framework designed to synthesize realistic and coherent 3D scenes from a single top-down view.<n>We decompose the input image into overlapping regions and generate each using a pretrained 3D object generator.<n>Our results demonstrate that high-quality 3D town generation is achievable from a single image using a principled, training-free approach.
arXiv Detail & Related papers (2025-05-21T17:10:47Z)
ID-to-3D: Expressive ID-guided 3D Heads via Score Distillation Sampling [96.87575334960258]
ID-to-3D is a method to generate identity- and text-guided 3D human heads with disentangled expressions. Results achieve an unprecedented level of identity-consistent and high-quality texture and geometry generation.
arXiv Detail & Related papers (2024-05-26T13:36:45Z)
Portrait3D: Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior [21.18119394249007]
We present Portrait3D, a novel neural rendering-based framework with a novel joint geometry-appearance prior to achieve text-to-3D-portrait generation. Our experimental results show that Portrait3D can produce realistic, high-quality, and canonical 3D portraits that align with the prompt.
arXiv Detail & Related papers (2024-04-16T08:52:42Z)
Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models. Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z)
Articulated 3D Head Avatar Generation using Text-to-Image Diffusion Models [107.84324544272481]
The ability to generate diverse 3D articulated head avatars is vital to a plethora of applications, including augmented reality, cinematography, and education. Recent work on text-guided 3D object generation has shown great promise in addressing these needs. We show that our diffusion-based articulated head avatars outperform state-of-the-art approaches for this task.
arXiv Detail & Related papers (2023-07-10T19:15:32Z)
TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision [114.56048848216254]
We present a novel framework, TAPS3D, to train a text-guided 3D shape generator with pseudo captions. Based on rendered 2D images, we retrieve relevant words from the CLIP vocabulary and construct pseudo captions using templates. Our constructed captions provide high-level semantic supervision for generated 3D shapes.
arXiv Detail & Related papers (2023-03-23T13:53:16Z)
Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion [115.82306502822412]
StyleGAN has achieved great progress in 2D face reconstruction and semantic editing via image inversion and latent editing. A corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing. We study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures.
arXiv Detail & Related papers (2022-12-14T18:49:50Z)
Fine Detailed Texture Learning for 3D Meshes with Generative Models [33.42114674602613]
This paper presents a method to reconstruct high-quality textured 3D models from both multi-view and single-view images. In the first stage, we focus on learning accurate geometry, whereas in the second stage, we focus on learning the texture with a generative adversarial network. We demonstrate that our method achieves superior 3D textured models compared to the previous works.
arXiv Detail & Related papers (2022-03-17T14:50:52Z)
StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation [34.01352591390208]
We introduce a high resolution, 3D-consistent image and shape generation technique which we call StyleSDF. Our method is trained on single-view RGB data only, and stands on the shoulders of StyleGAN2 for image generation.
arXiv Detail & Related papers (2021-12-21T18:45:45Z)
OSTeC: One-Shot Texture Completion [86.23018402732748]
We propose an unsupervised approach for one-shot 3D facial texture completion. The proposed approach rotates an input image in 3D and fill-in the unseen regions by reconstructing the rotated image in a 2D face generator. We frontalize the target image by projecting the completed texture into the generator.
arXiv Detail & Related papers (2020-12-30T23:53:26Z)
Improved Modeling of 3D Shapes with Multi-view Depth Maps [48.8309897766904]
We present a general-purpose framework for modeling 3D shapes using CNNs. Using just a single depth image of the object, we can output a dense multi-view depth map representation of 3D objects.
arXiv Detail & Related papers (2020-09-07T17:58:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.