FaceGPT: Self-supervised Learning to Chat about 3D Human Faces
- URL: http://arxiv.org/abs/2406.07163v1
- Date: Tue, 11 Jun 2024 11:13:29 GMT
- Title: FaceGPT: Self-supervised Learning to Chat about 3D Human Faces
- Authors: Haoran Wang, Mohit Mendiratta, Christian Theobalt, Adam Kortylewski,
- Abstract summary: We introduce FaceGPT, a self-supervised learning framework for Large Vision-Language Models (VLMs) to reason about 3D human faces from images and text.
FaceGPT overcomes this limitation by embedding the parameters of a 3D morphable face model (3DMM) into the token space of a VLM.
We show that FaceGPT achieves high-quality 3D face reconstructions and retains the ability for general-purpose visual instruction following.
- Score: 69.4651241319356
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce FaceGPT, a self-supervised learning framework for Large Vision-Language Models (VLMs) to reason about 3D human faces from images and text. Typical 3D face reconstruction methods are specialized algorithms that lack semantic reasoning capabilities. FaceGPT overcomes this limitation by embedding the parameters of a 3D morphable face model (3DMM) into the token space of a VLM, enabling the generation of 3D faces from both textual and visual inputs. FaceGPT is trained in a self-supervised manner as a model-based autoencoder from in-the-wild images. In particular, the hidden state of LLM is projected into 3DMM parameters and subsequently rendered as 2D face image to guide the self-supervised learning process via image-based reconstruction. Without relying on expensive 3D annotations of human faces, FaceGPT obtains a detailed understanding about 3D human faces, while preserving the capacity to understand general user instructions. Our experiments demonstrate that FaceGPT not only achieves high-quality 3D face reconstructions but also retains the ability for general-purpose visual instruction following. Furthermore, FaceGPT learns fully self-supervised to generate 3D faces based on complex textual inputs, which opens a new direction in human face analysis.
Related papers
- Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation
Using only Images [105.92311979305065]
TG-3DFace creates more realistic and aesthetically pleasing 3D faces, boosting 9% multi-view consistency (MVIC) over Latent3D.
The rendered face images generated by TG-3DFace achieve higher FID and CLIP score than text-to-2D face/image generation models.
arXiv Detail & Related papers (2023-08-31T14:26:33Z) - DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head
Video Generation [18.511092587156657]
We present a novel self-supervised method for learning dense 3D facial geometry from face videos.
We also propose a strategy to learn pixel-level uncertainties to perceive more reliable rigid-motion pixels for geometry learning.
We develop a 3D-aware cross-modal (ie, appearance and depth) attention mechanism to capture facial geometries in a coarse-to-fine manner.
arXiv Detail & Related papers (2023-05-10T14:58:33Z) - Generating 2D and 3D Master Faces for Dictionary Attacks with a
Network-Assisted Latent Space Evolution [68.8204255655161]
A master face is a face image that passes face-based identity authentication for a high percentage of the population.
We optimize these faces for 2D and 3D face verification models.
In 3D, we generate faces using the 2D StyleGAN2 generator and predict a 3D structure using a deep 3D face reconstruction network.
arXiv Detail & Related papers (2022-11-25T09:15:38Z) - Image-to-Video Generation via 3D Facial Dynamics [78.01476554323179]
We present a versatile model, FaceAnime, for various video generation tasks from still images.
Our model is versatile for various AR/VR and entertainment applications, such as face video and face video prediction.
arXiv Detail & Related papers (2021-05-31T02:30:11Z) - Realistic face animation generation from videos [2.398608007786179]
3D face reconstruction and face alignment are two fundamental and highly related topics in computer vision.
Recently, some works start to use deep learning models to estimate the 3DMM coefficients to reconstruct 3D face geometry.
To address this problem, some end-to-end methods, which can completely bypass the calculation of 3DMM coefficients, are proposed.
arXiv Detail & Related papers (2021-03-27T20:18:14Z) - Reconstructing A Large Scale 3D Face Dataset for Deep 3D Face
Identification [9.159921061636695]
We propose a framework of 2D-aided deep 3D face identification.
In particular, we propose to reconstruct millions of 3D face scans from a large scale 2D face database.
Our proposed approach achieves state-of-the-art rank-1 scores on the FRGC v2.0, Bosphorus, and BU-3DFE 3D face databases.
arXiv Detail & Related papers (2020-10-16T13:48:38Z) - Learning 3D Face Reconstruction with a Pose Guidance Network [49.13404714366933]
We present a self-supervised learning approach to learning monocular 3D face reconstruction with a pose guidance network (PGN)
First, we unveil the bottleneck of pose estimation in prior parametric 3D face learning methods, and propose to utilize 3D face landmarks for estimating pose parameters.
With our specially designed PGN, our model can learn from both faces with fully labeled 3D landmarks and unlimited unlabeled in-the-wild face images.
arXiv Detail & Related papers (2020-10-09T06:11:17Z) - StyleRig: Rigging StyleGAN for 3D Control over Portrait Images [81.43265493604302]
StyleGAN generates portrait images of faces with eyes, teeth, hair and context (neck, shoulders, background)
StyleGAN lacks a rig-like control over semantic face parameters that are interpretable in 3D, such as face pose, expressions, and scene illumination.
We present the first method to provide a face rig-like control over a pretrained and fixed StyleGAN via a 3DMM.
arXiv Detail & Related papers (2020-03-31T21:20:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.