Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation
Using only Images
- URL: http://arxiv.org/abs/2308.16758v1
- Date: Thu, 31 Aug 2023 14:26:33 GMT
- Title: Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation
Using only Images
- Authors: Cuican Yu, Guansong Lu, Yihan Zeng, Jian Sun, Xiaodan Liang, Huibin
Li, Zongben Xu, Songcen Xu, Wei Zhang, Hang Xu
- Abstract summary: TG-3DFace creates more realistic and aesthetically pleasing 3D faces, boosting 9% multi-view consistency (MVIC) over Latent3D.
The rendered face images generated by TG-3DFace achieve higher FID and CLIP score than text-to-2D face/image generation models.
- Score: 105.92311979305065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating 3D faces from textual descriptions has a multitude of
applications, such as gaming, movie, and robotics. Recent progresses have
demonstrated the success of unconditional 3D face generation and text-to-3D
shape generation. However, due to the limited text-3D face data pairs,
text-driven 3D face generation remains an open problem. In this paper, we
propose a text-guided 3D faces generation method, refer as TG-3DFace, for
generating realistic 3D faces using text guidance. Specifically, we adopt an
unconditional 3D face generation framework and equip it with text conditions,
which learns the text-guided 3D face generation with only text-2D face data. On
top of that, we propose two text-to-face cross-modal alignment techniques,
including the global contrastive learning and the fine-grained alignment
module, to facilitate high semantic consistency between generated 3D faces and
input texts. Besides, we present directional classifier guidance during the
inference process, which encourages creativity for out-of-domain generations.
Compared to the existing methods, TG-3DFace creates more realistic and
aesthetically pleasing 3D faces, boosting 9% multi-view consistency (MVIC) over
Latent3D. The rendered face images generated by TG-3DFace achieve higher FID
and CLIP score than text-to-2D face/image generation models, demonstrating our
superiority in generating realistic and semantic-consistent textures.
Related papers
- Controllable 3D Face Generation with Conditional Style Code Diffusion [51.24656496304069]
TEx-Face(TExt & Expression-to-Face) addresses challenges by dividing the task into three components, i.e., 3D GAN Inversion, Conditional Style Code Diffusion, and 3D Face Decoding.
Experiments conducted on FFHQ, CelebA-HQ, and CelebA-Dialog demonstrate the promising performance of our TEx-Face.
arXiv Detail & Related papers (2023-12-21T15:32:49Z) - Control3D: Towards Controllable Text-to-3D Generation [107.81136630589263]
We present a text-to-3D generation conditioning on the additional hand-drawn sketch, namely Control3D.
A 2D conditioned diffusion model (ControlNet) is remoulded to guide the learning of 3D scene parameterized as NeRF.
We exploit a pre-trained differentiable photo-to-sketch model to directly estimate the sketch of the rendered image over synthetic 3D scene.
arXiv Detail & Related papers (2023-11-09T15:50:32Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - Fake It Without Making It: Conditioned Face Generation for Accurate 3D
Face Reconstruction [5.079602839359523]
We present a method to generate a large-scale synthesised dataset of 250K photorealistic images and their corresponding shape parameters and depth maps, which we call SynthFace.
Our synthesis method conditions Stable Diffusion on depth maps sampled from the FLAME 3D Morphable Model (3DMM) of the human face, allowing us to generate a diverse set of shape-consistent facial images that is designed to be balanced in race and gender.
We propose ControlFace, a deep neural network, trained on SynthFace, which achieves competitive performance on the NoW benchmark, without requiring 3D supervision or manual 3D asset creation.
arXiv Detail & Related papers (2023-07-25T16:42:06Z) - Articulated 3D Head Avatar Generation using Text-to-Image Diffusion
Models [107.84324544272481]
The ability to generate diverse 3D articulated head avatars is vital to a plethora of applications, including augmented reality, cinematography, and education.
Recent work on text-guided 3D object generation has shown great promise in addressing these needs.
We show that our diffusion-based articulated head avatars outperform state-of-the-art approaches for this task.
arXiv Detail & Related papers (2023-07-10T19:15:32Z) - Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields [29.907615852310204]
We present Text2NeRF, which is able to generate a wide range of 3D scenes purely from a text prompt.
Our method requires no additional training data but only a natural language description of the scene as the input.
arXiv Detail & Related papers (2023-05-19T10:58:04Z) - High-Fidelity 3D Face Generation from Natural Language Descriptions [12.22081892575208]
We argue the major obstacle lies in 1) the lack of high-quality 3D face data with descriptive text annotation, and 2) the complex mapping relationship between descriptive language space and shape/appearance space.
We build Describe3D dataset, the first large-scale dataset with fine-grained text descriptions for text-to-3D face generation task.
We propose a two-stage framework to first generate a 3D face that matches the concrete descriptions, then optimize the parameters in the 3D shape and texture space with abstract description to refine the 3D face model.
arXiv Detail & Related papers (2023-05-05T06:10:15Z) - 3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation [107.46972849241168]
3D-TOGO model generates 3D objects in the form of the neural radiance field with good texture.
Experiments on the largest 3D object dataset (i.e., ABO) are conducted to verify that 3D-TOGO can better generate high-quality 3D objects.
arXiv Detail & Related papers (2022-12-02T11:31:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.