StyleT2F: Generating Human Faces from Textual Description Using
StyleGAN2
- URL: http://arxiv.org/abs/2204.07924v1
- Date: Sun, 17 Apr 2022 04:51:30 GMT
- Title: StyleT2F: Generating Human Faces from Textual Description Using
StyleGAN2
- Authors: Mohamed Shawky Sabae, Mohamed Ahmed Dardir, Remonda Talaat Eskarous,
Mohamed Ramzy Ebbed
- Abstract summary: StyleT2F is a method of controlling the output of StyleGAN2 using text.
Our method proves to capture the required features correctly and shows consistency between the input text and the output images.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: AI-driven image generation has improved significantly in recent years.
Generative adversarial networks (GANs), like StyleGAN, are able to generate
high-quality realistic data and have artistic control over the output, as well.
In this work, we present StyleT2F, a method of controlling the output of
StyleGAN2 using text, in order to be able to generate a detailed human face
from textual description. We utilize StyleGAN's latent space to manipulate
different facial features and conditionally sample the required latent code,
which embeds the facial features mentioned in the input text. Our method proves
to capture the required features correctly and shows consistency between the
input text and the output images. Moreover, our method guarantees
disentanglement on manipulating a wide range of facial features that
sufficiently describes a human face.
Related papers
- FlashFace: Human Image Personalization with High-fidelity Identity Preservation [59.76645602354481]
FlashFace allows users to easily personalize their own photos by providing one or a few reference face images and a text prompt.
Our approach is distinguishable from existing human photo customization methods by higher-fidelity identity preservation and better instruction following.
arXiv Detail & Related papers (2024-03-25T17:59:57Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation
Using only Images [105.92311979305065]
TG-3DFace creates more realistic and aesthetically pleasing 3D faces, boosting 9% multi-view consistency (MVIC) over Latent3D.
The rendered face images generated by TG-3DFace achieve higher FID and CLIP score than text-to-2D face/image generation models.
arXiv Detail & Related papers (2023-08-31T14:26:33Z) - StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces [103.54337984566877]
We use dilated convolutions to rescale the receptive fields of shallow layers in StyleGAN without altering any model parameters.
This allows fixed-size small features at shallow layers to be extended into larger ones that can accommodate variable resolutions.
We validate our method using unaligned face inputs of various resolutions in a diverse set of face manipulation tasks.
arXiv Detail & Related papers (2023-03-10T18:59:33Z) - HumanDiffusion: a Coarse-to-Fine Alignment Diffusion Framework for
Controllable Text-Driven Person Image Generation [73.3790833537313]
Controllable person image generation promotes a wide range of applications such as digital human interaction and virtual try-on.
We propose HumanDiffusion, a coarse-to-fine alignment diffusion framework, for text-driven person image generation.
arXiv Detail & Related papers (2022-11-11T14:30:34Z) - Text-to-Face Generation with StyleGAN2 [0.0]
We propose a novel framework, to generate facial images that are well-aligned with the input descriptions.
Our framework utilizes the high-resolution face generator, StyleGAN2, and explores the possibility of using it in T2F.
The images generated exhibit a 57% similarity to the ground truth images, with a face semantic distance of 0.92, outperforming state-of-the-artwork.
arXiv Detail & Related papers (2022-05-25T06:02:01Z) - AnyFace: Free-style Text-to-Face Synthesis and Manipulation [41.61972206254537]
This paper proposes the first free-style text-to-face method namely AnyFace.
AnyFace enables much wider open world applications such as metaverse, social media, cosmetics, forensics, etc.
arXiv Detail & Related papers (2022-03-29T08:27:38Z) - Semantic Text-to-Face GAN -ST^2FG [0.7919810878571298]
We present a novel approach to generate facial images from semantic text descriptions.
For security and criminal identification, the ability to provide a GAN-based system that works like a sketch artist would be incredibly useful.
arXiv Detail & Related papers (2021-07-22T15:42:25Z) - StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery [71.1862388442953]
We develop a text-based interface for StyleGAN image manipulation.
We first introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to a user-provided text prompt.
Next, we describe a latent mapper that infers a text-guided latent manipulation step for a given input image, allowing faster and more stable text-based manipulation.
arXiv Detail & Related papers (2021-03-31T17:51:25Z) - Faces \`a la Carte: Text-to-Face Generation via Attribute
Disentanglement [9.10088750358281]
Text-to-Face (TTF) is a challenging task with great potential for diverse computer vision applications.
We propose a Text-to-Face model that produces images in high resolution (1024x1024) with text-to-image consistency.
We refer to our model as TTF-HD. Experimental results show that TTF-HD generates high-quality faces with state-of-the-art performance.
arXiv Detail & Related papers (2020-06-13T10:24:31Z) - StyleGAN2 Distillation for Feed-forward Image Manipulation [5.5080625617632]
StyleGAN2 is a state-of-the-art network in generating realistic images.
We propose a way to distill a particular image manipulation of StyleGAN2 into image-to-image network trained in paired way.
arXiv Detail & Related papers (2020-03-07T14:02:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.