AnyFace: Free-style Text-to-Face Synthesis and Manipulation
- URL: http://arxiv.org/abs/2203.15334v1
- Date: Tue, 29 Mar 2022 08:27:38 GMT
- Title: AnyFace: Free-style Text-to-Face Synthesis and Manipulation
- Authors: Jianxin Sun, Qiyao Deng, Qi Li, Muyi Sun, Min Ren, Zhenan Sun
- Abstract summary: This paper proposes the first free-style text-to-face method namely AnyFace.
AnyFace enables much wider open world applications such as metaverse, social media, cosmetics, forensics, etc.
- Score: 41.61972206254537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing text-to-image synthesis methods generally are only applicable to
words in the training dataset. However, human faces are so variable to be
described with limited words. So this paper proposes the first free-style
text-to-face method namely AnyFace enabling much wider open world applications
such as metaverse, social media, cosmetics, forensics, etc. AnyFace has a novel
two-stream framework for face image synthesis and manipulation given arbitrary
descriptions of the human face. Specifically, one stream performs text-to-face
generation and the other conducts face image reconstruction. Facial text and
image features are extracted using the CLIP (Contrastive Language-Image
Pre-training) encoders. And a collaborative Cross Modal Distillation (CMD)
module is designed to align the linguistic and visual features across these two
streams. Furthermore, a Diverse Triplet Loss (DT loss) is developed to model
fine-grained features and improve facial diversity. Extensive experiments on
Multi-modal CelebA-HQ and CelebAText-HQ demonstrate significant advantages of
AnyFace over state-of-the-art methods. AnyFace can achieve high-quality,
high-resolution, and high-diversity face synthesis and manipulation results
without any constraints on the number and content of input captions.
Related papers
- FlashFace: Human Image Personalization with High-fidelity Identity Preservation [59.76645602354481]
FlashFace allows users to easily personalize their own photos by providing one or a few reference face images and a text prompt.
Our approach is distinguishable from existing human photo customization methods by higher-fidelity identity preservation and better instruction following.
arXiv Detail & Related papers (2024-03-25T17:59:57Z) - Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation
Using only Images [105.92311979305065]
TG-3DFace creates more realistic and aesthetically pleasing 3D faces, boosting 9% multi-view consistency (MVIC) over Latent3D.
The rendered face images generated by TG-3DFace achieve higher FID and CLIP score than text-to-2D face/image generation models.
arXiv Detail & Related papers (2023-08-31T14:26:33Z) - GaFET: Learning Geometry-aware Facial Expression Translation from
In-The-Wild Images [55.431697263581626]
We introduce a novel Geometry-aware Facial Expression Translation framework, which is based on parametric 3D facial representations and can stably decoupled expression.
We achieve higher-quality and more accurate facial expression transfer results compared to state-of-the-art methods, and demonstrate applicability of various poses and complex textures.
arXiv Detail & Related papers (2023-08-07T09:03:35Z) - HumanDiffusion: a Coarse-to-Fine Alignment Diffusion Framework for
Controllable Text-Driven Person Image Generation [73.3790833537313]
Controllable person image generation promotes a wide range of applications such as digital human interaction and virtual try-on.
We propose HumanDiffusion, a coarse-to-fine alignment diffusion framework, for text-driven person image generation.
arXiv Detail & Related papers (2022-11-11T14:30:34Z) - Multi-Attributed and Structured Text-to-Face Synthesis [1.3381749415517017]
Generative Adrial Networks (GANs) have revolutionized image synthesis through many applications like face generation, photograph editing, and image super-resolution.
This paper empirically proves that increasing the number of facial attributes in each textual description helps GANs generate more diverse and real-looking faces.
arXiv Detail & Related papers (2021-08-25T07:52:21Z) - Semantic Text-to-Face GAN -ST^2FG [0.7919810878571298]
We present a novel approach to generate facial images from semantic text descriptions.
For security and criminal identification, the ability to provide a GAN-based system that works like a sketch artist would be incredibly useful.
arXiv Detail & Related papers (2021-07-22T15:42:25Z) - Towards Open-World Text-Guided Face Image Generation and Manipulation [52.83401421019309]
We propose a unified framework for both face image generation and manipulation.
Our method supports open-world scenarios, including both image and text, without any re-training, fine-tuning, or post-processing.
arXiv Detail & Related papers (2021-04-18T16:56:07Z) - Faces \`a la Carte: Text-to-Face Generation via Attribute
Disentanglement [9.10088750358281]
Text-to-Face (TTF) is a challenging task with great potential for diverse computer vision applications.
We propose a Text-to-Face model that produces images in high resolution (1024x1024) with text-to-image consistency.
We refer to our model as TTF-HD. Experimental results show that TTF-HD generates high-quality faces with state-of-the-art performance.
arXiv Detail & Related papers (2020-06-13T10:24:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.