TextGaze: Gaze-Controllable Face Generation with Natural Language
- URL: http://arxiv.org/abs/2404.17486v3
- Date: Sat, 28 Sep 2024 08:46:25 GMT
- Title: TextGaze: Gaze-Controllable Face Generation with Natural Language
- Authors: Hengfei Wang, Zhongqun Zhang, Yihua Cheng, Hyung Jin Chang,
- Abstract summary: We present a novel gaze-controllable face generation task.
Our approach inputs textual descriptions that describe human gaze and head behavior and generates corresponding face images.
Experiments on the FFHQ dataset show the effectiveness of our method.
- Score: 20.957791298860712
- License:
- Abstract: Generating face image with specific gaze information has attracted considerable attention. Existing approaches typically input gaze values directly for face generation, which is unnatural and requires annotated gaze datasets for training, thereby limiting its application. In this paper, we present a novel gaze-controllable face generation task. Our approach inputs textual descriptions that describe human gaze and head behavior and generates corresponding face images. Our work first introduces a text-of-gaze dataset containing over 90k text descriptions spanning a dense distribution of gaze and head poses. We further propose a gaze-controllable text-to-face method. Our method contains a sketch-conditioned face diffusion module and a model-based sketch diffusion module. We define a face sketch based on facial landmarks and eye segmentation map. The face diffusion module generates face images from the face sketch, and the sketch diffusion module employs a 3D face model to generate face sketch from text description. Experiments on the FFHQ dataset show the effectiveness of our method. We will release our dataset and code for future research.
Related papers
- Knowledge-Enhanced Facial Expression Recognition with Emotional-to-Neutral Transformation [66.53435569574135]
Existing facial expression recognition methods typically fine-tune a pre-trained visual encoder using discrete labels.
We observe that the rich knowledge in text embeddings, generated by vision-language models, is a promising alternative for learning discriminative facial expression representations.
We propose a novel knowledge-enhanced FER method with an emotional-to-neutral transformation.
arXiv Detail & Related papers (2024-09-13T07:28:57Z) - Towards Localized Fine-Grained Control for Facial Expression Generation [54.82883891478555]
Humans, particularly their faces, are central to content generation due to their ability to convey rich expressions and intent.
Current generative models mostly generate flat neutral expressions and characterless smiles without authenticity.
We propose the use of AUs (action units) for facial expression control in face generation.
arXiv Detail & Related papers (2024-07-25T18:29:48Z) - FlashFace: Human Image Personalization with High-fidelity Identity Preservation [59.76645602354481]
FlashFace allows users to easily personalize their own photos by providing one or a few reference face images and a text prompt.
Our approach is distinguishable from existing human photo customization methods by higher-fidelity identity preservation and better instruction following.
arXiv Detail & Related papers (2024-03-25T17:59:57Z) - Face0: Instantaneously Conditioning a Text-to-Image Model on a Face [3.5150821092068383]
We present Face0, a novel way to instantaneously condition a text-to-image generation model on a face.
We augment a dataset of annotated images with embeddings of the included faces and train an image generation model, on the augmented dataset.
Our method achieves pleasing results, is remarkably simple, extremely fast, and equips the underlying model with new capabilities.
arXiv Detail & Related papers (2023-06-11T09:52:03Z) - AnyFace: Free-style Text-to-Face Synthesis and Manipulation [41.61972206254537]
This paper proposes the first free-style text-to-face method namely AnyFace.
AnyFace enables much wider open world applications such as metaverse, social media, cosmetics, forensics, etc.
arXiv Detail & Related papers (2022-03-29T08:27:38Z) - Semantic Text-to-Face GAN -ST^2FG [0.7919810878571298]
We present a novel approach to generate facial images from semantic text descriptions.
For security and criminal identification, the ability to provide a GAN-based system that works like a sketch artist would be incredibly useful.
arXiv Detail & Related papers (2021-07-22T15:42:25Z) - VariTex: Variational Neural Face Textures [0.0]
VariTex is a method that learns a variational latent feature space of neural face textures.
To generate images of complete human heads, we propose an additive decoder that generates plausible additional details such as hair.
The resulting method can generate geometrically consistent images of novel identities allowing fine-grained control over head pose, face shape, and facial expressions.
arXiv Detail & Related papers (2021-04-13T07:47:53Z) - FaceDet3D: Facial Expressions with 3D Geometric Detail Prediction [62.5557724039217]
Facial expressions induce a variety of high-level details on the 3D face geometry.
Morphable Models (3DMMs) of the human face fail to capture such fine details in their PCA-based representations.
We introduce FaceDet3D, a first-of-its-kind method that generates - from a single image - geometric facial details consistent with any desired target expression.
arXiv Detail & Related papers (2020-12-14T23:07:38Z) - Face Forgery Detection by 3D Decomposition [72.22610063489248]
We consider a face image as the production of the intervention of the underlying 3D geometry and the lighting environment.
By disentangling the face image into 3D shape, common texture, identity texture, ambient light, and direct light, we find the devil lies in the direct light and the identity texture.
We propose to utilize facial detail, which is the combination of direct light and identity texture, as the clue to detect the subtle forgery patterns.
arXiv Detail & Related papers (2020-11-19T09:25:44Z) - FaR-GAN for One-Shot Face Reenactment [20.894596219099164]
We present a one-shot face reenactment model, FaR-GAN, that takes only one face image of any given source identity and a target expression as input.
The proposed method makes no assumptions about the source identity, facial expression, head pose, or even image background.
arXiv Detail & Related papers (2020-05-13T16:15:37Z) - It's Written All Over Your Face: Full-Face Appearance-Based Gaze
Estimation [82.16380486281108]
We propose an appearance-based method that only takes the full face image as input.
Our method encodes the face image using a convolutional neural network with spatial weights applied on the feature maps.
We show that our full-face method significantly outperforms the state of the art for both 2D and 3D gaze estimation.
arXiv Detail & Related papers (2016-11-27T15:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.