InstantID: Zero-shot Identity-Preserving Generation in Seconds
- URL: http://arxiv.org/abs/2401.07519v2
- Date: Fri, 2 Feb 2024 16:15:22 GMT
- Title: InstantID: Zero-shot Identity-Preserving Generation in Seconds
- Authors: Qixun Wang, Xu Bai, Haofan Wang, Zekui Qin, Anthony Chen, Huaxia Li,
Xu Tang, and Yao Hu
- Abstract summary: We introduce InstantID, a powerful diffusion model-based solution for ID embedding.
Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image.
Our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL.
- Score: 21.04236321562671
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has been significant progress in personalized image synthesis with
methods such as Textual Inversion, DreamBooth, and LoRA. Yet, their real-world
applicability is hindered by high storage demands, lengthy fine-tuning
processes, and the need for multiple reference images. Conversely, existing ID
embedding-based methods, while requiring only a single forward inference, face
challenges: they either necessitate extensive fine-tuning across numerous model
parameters, lack compatibility with community pre-trained models, or fail to
maintain high face fidelity. Addressing these limitations, we introduce
InstantID, a powerful diffusion model-based solution. Our plug-and-play module
adeptly handles image personalization in various styles using just a single
facial image, while ensuring high fidelity. To achieve this, we design a novel
IdentityNet by imposing strong semantic and weak spatial conditions,
integrating facial and landmark images with textual prompts to steer the image
generation. InstantID demonstrates exceptional performance and efficiency,
proving highly beneficial in real-world applications where identity
preservation is paramount. Moreover, our work seamlessly integrates with
popular pre-trained text-to-image diffusion models like SD1.5 and SDXL, serving
as an adaptable plugin. Our codes and pre-trained checkpoints will be available
at https://github.com/InstantID/InstantID.
Related papers
- Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis [7.099258248662009]
Text-to-image (T2I) models have significantly advanced the development of artificial intelligence.
However, existing T2I-based methods often struggle to accurately reproduce the appearance of individuals from a reference image.
We leverage the pre-trained UNet from Stable Diffusion to incorporate the target face image directly into the generation process.
arXiv Detail & Related papers (2024-09-27T19:31:04Z) - Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training [51.87027943520492]
We present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities.
Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities.
arXiv Detail & Related papers (2024-06-10T06:26:03Z) - InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation [0.0]
"InstantFamily" is an approach that employs a novel cross-attention mechanism and a multimodal embedding stack to achieve zero-shot multi-ID image generation.
Our method effectively preserves ID as it utilizes global and local features from a pre-trained face recognition model integrated with text conditions.
arXiv Detail & Related papers (2024-04-30T10:16:21Z) - ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising.
We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z) - Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm [31.06269858216316]
We propose Infinite-ID, an ID-semantics decoupling paradigm for identity-preserved personalization.
We introduce an identity-enhanced training, incorporating an additional image cross-attention module to capture sufficient ID information.
We also introduce a feature interaction mechanism that combines a mixed attention module with an AdaIN-mean operation to seamlessly merge the two streams.
arXiv Detail & Related papers (2024-03-18T13:39:53Z) - Beyond Inserting: Learning Identity Embedding for Semantic-Fidelity Personalized Diffusion Generation [21.739328335601716]
This paper focuses on inserting accurate and interactive ID embedding into the Stable Diffusion Model for personalized generation.
We propose a face-wise attention loss to fit the face region instead of entangling ID-unrelated information, such as face layout and background.
Our results exhibit superior ID accuracy, text-based manipulation ability, and generalization compared to previous methods.
arXiv Detail & Related papers (2024-01-31T11:52:33Z) - PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved
Personalization [92.90392834835751]
PortraitBooth is designed for high efficiency, robust identity preservation, and expression-editable text-to-image generation.
PortraitBooth eliminates computational overhead and mitigates identity distortion.
It incorporates emotion-aware cross-attention control for diverse facial expressions in generated images.
arXiv Detail & Related papers (2023-12-11T13:03:29Z) - PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding [102.07914175196817]
PhotoMaker is an efficient personalized text-to-image generation method.
It encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information.
arXiv Detail & Related papers (2023-12-07T17:32:29Z) - Identity Encoder for Personalized Diffusion [57.1198884486401]
We propose an encoder-based approach for personalization.
We learn an identity encoder which can extract an identity representation from a set of reference images of a subject.
We show that our approach consistently outperforms existing fine-tuning based approach in both image generation and reconstruction.
arXiv Detail & Related papers (2023-04-14T23:32:24Z) - MetaPortrait: Identity-Preserving Talking Head Generation with Fast
Personalized Adaptation [57.060828009199646]
We propose an ID-preserving talking head generation framework.
We claim that dense landmarks are crucial to achieving accurate geometry-aware flow fields.
We adaptively fuse the source identity during synthesis, so that the network better preserves the key characteristics of the image portrait.
arXiv Detail & Related papers (2022-12-15T18:59:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.