MagicNaming: Consistent Identity Generation by Finding a "Name Space" in T2I Diffusion Models
- URL: http://arxiv.org/abs/2412.14902v1
- Date: Thu, 19 Dec 2024 14:32:11 GMT
- Title: MagicNaming: Consistent Identity Generation by Finding a "Name Space" in T2I Diffusion Models
- Authors: Jing Zhao, Heliang Zheng, Chaoyue Wang, Long Lan, Wanrong Hunag, Yuhua Tang,
- Abstract summary: We explore the existence of a "Name Space", where any point in the space corresponds to a specific identity.
We first extract the embeddings of celebrities' names in the Laion5B dataset with the text encoder of diffusion models.
We experimentally find that such name embeddings work well in promising the generated image with good identity consistency.
- Score: 29.937693075899713
- License:
- Abstract: Large-scale text-to-image diffusion models, (e.g., DALL-E, SDXL) are capable of generating famous persons by simply referring to their names. Is it possible to make such models generate generic identities as simple as the famous ones, e.g., just use a name? In this paper, we explore the existence of a "Name Space", where any point in the space corresponds to a specific identity. Fortunately, we find some clues in the feature space spanned by text embedding of celebrities' names. Specifically, we first extract the embeddings of celebrities' names in the Laion5B dataset with the text encoder of diffusion models. Such embeddings are used as supervision to learn an encoder that can predict the name (actually an embedding) of a given face image. We experimentally find that such name embeddings work well in promising the generated image with good identity consistency. Note that like the names of celebrities, our predicted name embeddings are disentangled from the semantics of text inputs, making the original generation capability of text-to-image models well-preserved. Moreover, by simply plugging such name embeddings, all variants (e.g., from Civitai) derived from the same base model (i.e., SDXL) readily become identity-aware text-to-image models. Project homepage: \url{https://magicfusion.github.io/MagicNaming/}.
Related papers
- CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models [58.37569942713456]
CharacterFactory is a framework that allows sampling new characters with consistent identities in the latent space of GANs.
The whole model only takes 10 minutes for training, and can sample infinite characters end-to-end during inference.
arXiv Detail & Related papers (2024-04-24T06:15:31Z) - Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval [53.89454443114146]
We study the zero-shot Composed Image Retrieval (ZS-CIR) task, which is to retrieve the target image given a reference image and a description without training on the triplet datasets.
Previous works generate pseudo-word tokens by projecting the reference image features to the text embedding space.
We propose a Knowledge-Enhanced Dual-stream zero-shot composed image retrieval framework (KEDs)
KEDs implicitly models the attributes of the reference images by incorporating a database.
arXiv Detail & Related papers (2024-03-24T04:23:56Z) - Multicultural Name Recognition For Previously Unseen Names [65.268245109828]
This paper attempts to improve recognition of person names, a diverse category that can grow any time someone is born or changes their name.
I look at names from 103 countries to compare how well the model performs on names from different cultures.
I find that a model with combined character and word input outperforms word-only models and may improve on accuracy compared to classical NER models.
arXiv Detail & Related papers (2024-01-23T17:58:38Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - Not with my name! Inferring artists' names of input strings employed by
Diffusion Models [8.692128987695423]
Diffusion Models (DM) are highly effective at generating realistic, high-quality images.
However, these models lack creativity and merely compose outputs based on their training data.
In this paper, a preliminary study to infer the probability of use of an artist's name in the input string of a generated image is presented.
arXiv Detail & Related papers (2023-07-25T14:18:58Z) - Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors [40.959642112729234]
Peekaboo is a first-of-its-kind zero-shot, open-vocabulary, unsupervised semantic grounding technique.
We show how Peekaboo can be used to generate images with transparency, even though the underlying diffusion model was only trained on RGB images.
arXiv Detail & Related papers (2022-11-23T18:59:05Z) - Schr\"{o}dinger's Bat: Diffusion Models Sometimes Generate Polysemous
Words in Superposition [71.45263447328374]
Recent work has shown that text-to-image diffusion models can display strange behaviours when a prompt contains a word with multiple possible meanings.
We show that when given an input that is the sum of encodings of two distinct words, the model can produce an image containing both concepts represented in the sum.
We then demonstrate that the CLIP encoder used to encode prompts encodes polysemous words as a superposition of meanings, and that using linear algebraic techniques we can edit these representations to influence the senses represented in the generated images.
arXiv Detail & Related papers (2022-11-23T16:26:49Z) - Semantic Text-to-Face GAN -ST^2FG [0.7919810878571298]
We present a novel approach to generate facial images from semantic text descriptions.
For security and criminal identification, the ability to provide a GAN-based system that works like a sketch artist would be incredibly useful.
arXiv Detail & Related papers (2021-07-22T15:42:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.