Semantic Text-to-Face GAN -ST^2FG
- URL: http://arxiv.org/abs/2107.10756v4
- Date: Wed, 13 Dec 2023 08:44:16 GMT
- Title: Semantic Text-to-Face GAN -ST^2FG
- Authors: Manan Oza, Sukalpa Chanda and David Doermann
- Abstract summary: We present a novel approach to generate facial images from semantic text descriptions.
For security and criminal identification, the ability to provide a GAN-based system that works like a sketch artist would be incredibly useful.
- Score: 0.7919810878571298
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Faces generated using generative adversarial networks (GANs) have reached
unprecedented realism. These faces, also known as "Deep Fakes", appear as
realistic photographs with very little pixel-level distortions. While some work
has enabled the training of models that lead to the generation of specific
properties of the subject, generating a facial image based on a natural
language description has not been fully explored. For security and criminal
identification, the ability to provide a GAN-based system that works like a
sketch artist would be incredibly useful. In this paper, we present a novel
approach to generate facial images from semantic text descriptions. The learned
model is provided with a text description and an outline of the type of face,
which the model uses to sketch the features. Our models are trained using an
Affine Combination Module (ACM) mechanism to combine the text embedding from
BERT and the GAN latent space using a self-attention matrix. This avoids the
loss of features due to inadequate "attention", which may happen if text
embedding and latent vector are simply concatenated. Our approach is capable of
generating images that are very accurately aligned to the exhaustive textual
descriptions of faces with many fine detail features of the face and helps in
generating better images. The proposed method is also capable of making
incremental changes to a previously generated image if it is provided with
additional textual descriptions or sentences.
Related papers
- OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration.
We propose OSDFace, a novel one-step diffusion model for face restoration.
Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z) - Towards Localized Fine-Grained Control for Facial Expression Generation [54.82883891478555]
Humans, particularly their faces, are central to content generation due to their ability to convey rich expressions and intent.
Current generative models mostly generate flat neutral expressions and characterless smiles without authenticity.
We propose the use of AUs (action units) for facial expression control in face generation.
arXiv Detail & Related papers (2024-07-25T18:29:48Z) - Improving face generation quality and prompt following with synthetic captions [57.47448046728439]
We introduce a training-free pipeline designed to generate accurate appearance descriptions from images of people.
We then use these synthetic captions to fine-tune a text-to-image diffusion model.
Our results demonstrate that this approach significantly improves the model's ability to generate high-quality, realistic human faces.
arXiv Detail & Related papers (2024-05-17T15:50:53Z) - TextGaze: Gaze-Controllable Face Generation with Natural Language [20.957791298860712]
We present a novel gaze-controllable face generation task.
Our approach inputs textual descriptions that describe human gaze and head behavior and generates corresponding face images.
Experiments on the FFHQ dataset show the effectiveness of our method.
arXiv Detail & Related papers (2024-04-26T15:42:24Z) - FlashFace: Human Image Personalization with High-fidelity Identity Preservation [59.76645602354481]
FlashFace allows users to easily personalize their own photos by providing one or a few reference face images and a text prompt.
Our approach is distinguishable from existing human photo customization methods by higher-fidelity identity preservation and better instruction following.
arXiv Detail & Related papers (2024-03-25T17:59:57Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - GaFET: Learning Geometry-aware Facial Expression Translation from
In-The-Wild Images [55.431697263581626]
We introduce a novel Geometry-aware Facial Expression Translation framework, which is based on parametric 3D facial representations and can stably decoupled expression.
We achieve higher-quality and more accurate facial expression transfer results compared to state-of-the-art methods, and demonstrate applicability of various poses and complex textures.
arXiv Detail & Related papers (2023-08-07T09:03:35Z) - Face Generation from Textual Features using Conditionally Trained Inputs
to Generative Adversarial Networks [0.0]
We use the power of state of the art natural language processing models to convert face descriptions into learnable latent vectors.
The same approach can be tailored to generate any image based on fine grained textual features.
arXiv Detail & Related papers (2023-01-22T13:27:12Z) - AnyFace: Free-style Text-to-Face Synthesis and Manipulation [41.61972206254537]
This paper proposes the first free-style text-to-face method namely AnyFace.
AnyFace enables much wider open world applications such as metaverse, social media, cosmetics, forensics, etc.
arXiv Detail & Related papers (2022-03-29T08:27:38Z) - S2FGAN: Semantically Aware Interactive Sketch-to-Face Translation [11.724779328025589]
This paper proposes a sketch-to-image generation framework called S2FGAN.
We employ two latent spaces to control the face appearance and adjust the desired attributes of the generated face.
Our method successfully outperforms state-of-the-art methods on attribute manipulation by exploiting greater control of attribute intensity.
arXiv Detail & Related papers (2020-11-30T13:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.