Face0: Instantaneously Conditioning a Text-to-Image Model on a Face
- URL: http://arxiv.org/abs/2306.06638v1
- Date: Sun, 11 Jun 2023 09:52:03 GMT
- Title: Face0: Instantaneously Conditioning a Text-to-Image Model on a Face
- Authors: Dani Valevski, Danny Wasserman, Yossi Matias, Yaniv Leviathan
- Abstract summary: We present Face0, a novel way to instantaneously condition a text-to-image generation model on a face.
We augment a dataset of annotated images with embeddings of the included faces and train an image generation model, on the augmented dataset.
Our method achieves pleasing results, is remarkably simple, extremely fast, and equips the underlying model with new capabilities.
- Score: 3.5150821092068383
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Face0, a novel way to instantaneously condition a text-to-image
generation model on a face, in sample time, without any optimization procedures
such as fine-tuning or inversions. We augment a dataset of annotated images
with embeddings of the included faces and train an image generation model, on
the augmented dataset. Once trained, our system is practically identical at
inference time to the underlying base model, and is therefore able to generate
images, given a user-supplied face image and a prompt, in just a couple of
seconds. Our method achieves pleasing results, is remarkably simple, extremely
fast, and equips the underlying model with new capabilities, like controlling
the generated images both via text or via direct manipulation of the input face
embeddings. In addition, when using a fixed random vector instead of a face
embedding from a user supplied image, our method essentially solves the problem
of consistent character generation across images. Finally, while requiring
further research, we hope that our method, which decouples the model's textual
biases from its biases on faces, might be a step towards some mitigation of
biases in future text-to-image models.
Related papers
- Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
Diffusion models have dominated the field of large, generative image models.
We propose an algorithm for fast-constrained sampling in large pre-trained diffusion models.
arXiv Detail & Related papers (2024-10-24T14:52:38Z) - Improving face generation quality and prompt following with synthetic captions [57.47448046728439]
We introduce a training-free pipeline designed to generate accurate appearance descriptions from images of people.
We then use these synthetic captions to fine-tune a text-to-image diffusion model.
Our results demonstrate that this approach significantly improves the model's ability to generate high-quality, realistic human faces.
arXiv Detail & Related papers (2024-05-17T15:50:53Z) - Regeneration Based Training-free Attribution of Fake Images Generated by
Text-to-Image Generative Models [39.33821502730661]
We present a training-free method to attribute fake images generated by text-to-image models to their source models.
By calculating and ranking the similarity of the test image and the candidate images, we can determine the source of the image.
arXiv Detail & Related papers (2024-03-03T11:55:49Z) - DreamIdentity: Improved Editability for Efficient Face-identity
Preserved Image Generation [69.16517915592063]
We propose a novel face-identity encoder to learn an accurate representation of human faces.
We also propose self-augmented editability learning to enhance the editability of models.
Our methods can generate identity-preserved images under different scenes at a much faster speed.
arXiv Detail & Related papers (2023-07-01T11:01:17Z) - eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert
Denoisers [87.52504764677226]
Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis.
We train an ensemble of text-to-image diffusion models specialized for different stages synthesis.
Our ensemble of diffusion models, called eDiffi, results in improved text alignment while maintaining the same inference cost.
arXiv Detail & Related papers (2022-11-02T17:43:04Z) - LAFITE: Towards Language-Free Training for Text-to-Image Generation [83.2935513540494]
We propose the first work to train text-to-image generation models without any text data.
Our method leverages the well-aligned multi-modal semantic space of the powerful pre-trained CLIP model.
We obtain state-of-the-art results in the standard text-to-image generation tasks.
arXiv Detail & Related papers (2021-11-27T01:54:45Z) - S2FGAN: Semantically Aware Interactive Sketch-to-Face Translation [11.724779328025589]
This paper proposes a sketch-to-image generation framework called S2FGAN.
We employ two latent spaces to control the face appearance and adjust the desired attributes of the generated face.
Our method successfully outperforms state-of-the-art methods on attribute manipulation by exploiting greater control of attribute intensity.
arXiv Detail & Related papers (2020-11-30T13:42:39Z) - Generating Person Images with Appearance-aware Pose Stylizer [66.44220388377596]
We present a novel end-to-end framework to generate realistic person images based on given person poses and appearances.
The core of our framework is a novel generator called Appearance-aware Pose Stylizer (APS) which generates human images by coupling the target pose with the conditioned person appearance progressively.
arXiv Detail & Related papers (2020-07-17T15:58:05Z) - Face Attribute Invertion [0.0]
We propose a novel self-perception method based on GANs for automatical face attribute inverse.
Our model is quite stable in training and capable of preserving finer details of the original face images.
arXiv Detail & Related papers (2020-01-14T08:41:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.