Related papers: Semantic Text-to-Face GAN -ST^2FG

Semantic Text-to-Face GAN -ST^2FG

URL: http://arxiv.org/abs/2107.10756v4
Date: Wed, 13 Dec 2023 08:44:16 GMT
Title: Semantic Text-to-Face GAN -ST^2FG
Authors: Manan Oza, Sukalpa Chanda and David Doermann
Abstract summary: We present a novel approach to generate facial images from semantic text descriptions. For security and criminal identification, the ability to provide a GAN-based system that works like a sketch artist would be incredibly useful.
Score: 0.7919810878571298
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Faces generated using generative adversarial networks (GANs) have reached unprecedented realism. These faces, also known as "Deep Fakes", appear as realistic photographs with very little pixel-level distortions. While some work has enabled the training of models that lead to the generation of specific properties of the subject, generating a facial image based on a natural language description has not been fully explored. For security and criminal identification, the ability to provide a GAN-based system that works like a sketch artist would be incredibly useful. In this paper, we present a novel approach to generate facial images from semantic text descriptions. The learned model is provided with a text description and an outline of the type of face, which the model uses to sketch the features. Our models are trained using an Affine Combination Module (ACM) mechanism to combine the text embedding from BERT and the GAN latent space using a self-attention matrix. This avoids the loss of features due to inadequate "attention", which may happen if text embedding and latent vector are simply concatenated. Our approach is capable of generating images that are very accurately aligned to the exhaustive textual descriptions of faces with many fine detail features of the face and helps in generating better images. The proposed method is also capable of making incremental changes to a previously generated image if it is provided with additional textual descriptions or sentences.

Related papers

OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration. We propose OSDFace, a novel one-step diffusion model for face restoration. Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z)
Towards Localized Fine-Grained Control for Facial Expression Generation [54.82883891478555]
Humans, particularly their faces, are central to content generation due to their ability to convey rich expressions and intent. Current generative models mostly generate flat neutral expressions and characterless smiles without authenticity. We propose the use of AUs (action units) for facial expression control in face generation.
arXiv Detail & Related papers (2024-07-25T18:29:48Z)
Improving face generation quality and prompt following with synthetic captions [57.47448046728439]
We introduce a training-free pipeline designed to generate accurate appearance descriptions from images of people. We then use these synthetic captions to fine-tune a text-to-image diffusion model. Our results demonstrate that this approach significantly improves the model's ability to generate high-quality, realistic human faces.
arXiv Detail & Related papers (2024-05-17T15:50:53Z)
TextGaze: Gaze-Controllable Face Generation with Natural Language [20.957791298860712]
We present a novel gaze-controllable face generation task. Our approach inputs textual descriptions that describe human gaze and head behavior and generates corresponding face images. Experiments on the FFHQ dataset show the effectiveness of our method.
arXiv Detail & Related papers (2024-04-26T15:42:24Z)
FlashFace: Human Image Personalization with High-fidelity Identity Preservation [59.76645602354481]
FlashFace allows users to easily personalize their own photos by providing one or a few reference face images and a text prompt. Our approach is distinguishable from existing human photo customization methods by higher-fidelity identity preservation and better instruction following.
arXiv Detail & Related papers (2024-03-25T17:59:57Z)
When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images. We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models. Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z)
GaFET: Learning Geometry-aware Facial Expression Translation from In-The-Wild Images [55.431697263581626]
We introduce a novel Geometry-aware Facial Expression Translation framework, which is based on parametric 3D facial representations and can stably decoupled expression. We achieve higher-quality and more accurate facial expression transfer results compared to state-of-the-art methods, and demonstrate applicability of various poses and complex textures.
arXiv Detail & Related papers (2023-08-07T09:03:35Z)
Face Generation from Textual Features using Conditionally Trained Inputs to Generative Adversarial Networks [0.0]
We use the power of state of the art natural language processing models to convert face descriptions into learnable latent vectors. The same approach can be tailored to generate any image based on fine grained textual features.
arXiv Detail & Related papers (2023-01-22T13:27:12Z)
AnyFace: Free-style Text-to-Face Synthesis and Manipulation [41.61972206254537]
This paper proposes the first free-style text-to-face method namely AnyFace. AnyFace enables much wider open world applications such as metaverse, social media, cosmetics, forensics, etc.
arXiv Detail & Related papers (2022-03-29T08:27:38Z)
S2FGAN: Semantically Aware Interactive Sketch-to-Face Translation [11.724779328025589]
This paper proposes a sketch-to-image generation framework called S2FGAN. We employ two latent spaces to control the face appearance and adjust the desired attributes of the generated face. Our method successfully outperforms state-of-the-art methods on attribute manipulation by exploiting greater control of attribute intensity.
arXiv Detail & Related papers (2020-11-30T13:42:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.