HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation
- URL: http://arxiv.org/abs/2312.07539v2
- Date: Wed, 8 May 2024 13:29:25 GMT
- Title: HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation
- Authors: Hongyu Liu, Xuan Wang, Ziyu Wan, Yujun Shen, Yibing Song, Jing Liao, Qifeng Chen,
- Abstract summary: This work presents HeadArtist for 3D head generation from text descriptions.
We come up with an efficient pipeline that optimize a parameterized 3D head model under the supervision of the prior distillation.
Experimental results suggest that our approach delivers high-quality 3D head sculptures with adequate geometry and photorealistic appearance.
- Score: 95.58892028614444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work presents HeadArtist for 3D head generation from text descriptions. With a landmark-guided ControlNet serving as the generative prior, we come up with an efficient pipeline that optimizes a parameterized 3D head model under the supervision of the prior distillation itself. We call such a process self score distillation (SSD). In detail, given a sampled camera pose, we first render an image and its corresponding landmarks from the head model, and add some particular level of noise onto the image. The noisy image, landmarks, and text condition are then fed into the frozen ControlNet twice for noise prediction. Two different classifier-free guidance (CFG) weights are applied during these two predictions, and the prediction difference offers a direction on how the rendered image can better match the text of interest. Experimental results suggest that our approach delivers high-quality 3D head sculptures with adequate geometry and photorealistic appearance, significantly outperforming state-ofthe-art methods. We also show that the same pipeline well supports editing the generated heads, including both geometry deformation and appearance change.
Related papers
- NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior [5.819784482811377]
We propose a novel method, NeRFFaceSpeech, which enables to produce high-quality 3D-aware talking head.
Our method can craft a 3D-consistent facial feature space corresponding to a single image.
We also introduce LipaintNet that can replenish the lacking information in the inner-mouth area.
arXiv Detail & Related papers (2024-05-09T13:14:06Z) - Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance
Fields using Geometry-Guided Text-to-Image Diffusion Model [39.64952340472541]
We propose a controllable text-to-3D avatar generation method whose facial expression is controllable.
Our main strategy is to construct the 3D avatar in Neural Radiance Fields (NeRF) optimized with a set of controlled viewpoint-aware images.
We demonstrate the empirical results and discuss the effectiveness of our method.
arXiv Detail & Related papers (2023-09-07T08:14:46Z) - Articulated 3D Head Avatar Generation using Text-to-Image Diffusion
Models [107.84324544272481]
The ability to generate diverse 3D articulated head avatars is vital to a plethora of applications, including augmented reality, cinematography, and education.
Recent work on text-guided 3D object generation has shown great promise in addressing these needs.
We show that our diffusion-based articulated head avatars outperform state-of-the-art approaches for this task.
arXiv Detail & Related papers (2023-07-10T19:15:32Z) - Magic123: One Image to High-Quality 3D Object Generation Using Both 2D
and 3D Diffusion Priors [104.79392615848109]
We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes from a single unposed image.
In the first stage, we optimize a neural radiance field to produce a coarse geometry.
In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture.
arXiv Detail & Related papers (2023-06-30T17:59:08Z) - ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image
Collections [71.46546520120162]
Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging.
We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild.
We produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations.
arXiv Detail & Related papers (2023-06-07T17:47:50Z) - HeadSculpt: Crafting 3D Head Avatars with Text [143.14548696613886]
We introduce a versatile pipeline dubbed HeadSculpt for crafting 3D head avatars from textual prompts.
We first equip the diffusion model with 3D awareness by leveraging landmark-based control and a learned textual embedding.
We propose a novel identity-aware editing score distillation strategy to optimize a textured mesh with a high-resolution differentiable rendering technique.
arXiv Detail & Related papers (2023-06-05T16:53:58Z) - Next3D: Generative Neural Texture Rasterization for 3D-Aware Head
Avatars [36.4402388864691]
3D-aware generative adversarial networks (GANs) synthesize high-fidelity and multi-view-consistent facial images using only collections of single-view 2D imagery.
Recent efforts incorporate 3D Morphable Face Model (3DMM) to describe deformation in generative radiance fields either explicitly or implicitly.
We propose a novel 3D GAN framework for unsupervised learning of generative, high-quality and 3D-consistent facial avatars from unstructured 2D images.
arXiv Detail & Related papers (2022-11-21T06:40:46Z) - Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control [54.079327030892244]
Free-HeadGAN is a person-generic neural talking head synthesis system.
We show that modeling faces with sparse 3D facial landmarks are sufficient for achieving state-of-the-art generative performance.
arXiv Detail & Related papers (2022-08-03T16:46:08Z) - Convolutional Generation of Textured 3D Meshes [34.20939983046376]
We propose a framework that can generate triangle meshes and associated high-resolution texture maps, using only 2D supervision from single-view natural images.
A key contribution of our work is the encoding of the mesh and texture as 2D representations, which are semantically aligned and can be easily modeled by a 2D convolutional GAN.
We demonstrate the efficacy of our method on Pascal3D+ Cars and CUB, both in an unconditional setting and in settings where the model is conditioned on class labels, attributes, and text.
arXiv Detail & Related papers (2020-06-13T15:23:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.