PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
- URL: http://arxiv.org/abs/2312.04461v1
- Date: Thu, 7 Dec 2023 17:32:29 GMT
- Title: PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
- Authors: Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying
Shan
- Abstract summary: PhotoMaker is an efficient personalized text-to-image generation method.
It encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information.
- Score: 102.07914175196817
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in text-to-image generation have made remarkable progress in
synthesizing realistic human photos conditioned on given text prompts. However,
existing personalized generation methods cannot simultaneously satisfy the
requirements of high efficiency, promising identity (ID) fidelity, and flexible
text controllability. In this work, we introduce PhotoMaker, an efficient
personalized text-to-image generation method, which mainly encodes an arbitrary
number of input ID images into a stack ID embedding for preserving ID
information. Such an embedding, serving as a unified ID representation, can not
only encapsulate the characteristics of the same input ID comprehensively, but
also accommodate the characteristics of different IDs for subsequent
integration. This paves the way for more intriguing and practically valuable
applications. Besides, to drive the training of our PhotoMaker, we propose an
ID-oriented data construction pipeline to assemble the training data. Under the
nourishment of the dataset constructed through the proposed pipeline, our
PhotoMaker demonstrates better ID preservation ability than test-time
fine-tuning based methods, yet provides significant speed improvements,
high-quality generation results, strong generalization capabilities, and a wide
range of applications. Our project page is available at
https://photo-maker.github.io/
Related papers
- ID-Patch: Robust ID Association for Group Photo Personalization [29.38844265790726]
ID-Patch is a novel method that provides robust association between identities and 2D positions.
Our approach generates an ID patch and ID embeddings from the same facial features.
arXiv Detail & Related papers (2024-11-20T18:55:28Z) - Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training [51.87027943520492]
We present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities.
Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities.
arXiv Detail & Related papers (2024-06-10T06:26:03Z) - Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter [23.690420512911146]
We propose a lightweight Inv-Adapter, which first extracts diffusion-domain representations of ID images utilizing a pre-trained text-to-image model via DDIM image inversion.
Benefiting from the high alignment of the extracted ID prompt features and the intermediate features of the text-to-image model, we then embed them efficiently into the base text-to-image model.
We conduct extensive experiments to assess ID fidelity, generation loyalty, speed, and training parameters, all of which show that the proposed Inv-Adapter is highly competitive in ID customization generation and model scale.
arXiv Detail & Related papers (2024-06-05T02:59:08Z) - InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation [0.0]
"InstantFamily" is an approach that employs a novel cross-attention mechanism and a multimodal embedding stack to achieve zero-shot multi-ID image generation.
Our method effectively preserves ID as it utilizes global and local features from a pre-trained face recognition model integrated with text conditions.
arXiv Detail & Related papers (2024-04-30T10:16:21Z) - ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising.
We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z) - StableIdentity: Inserting Anybody into Anywhere at First Sight [57.99693188913382]
We propose StableIdentity, which allows identity-consistent recontextualization with just one face image.
We are the first to directly inject the identity learned from a single image into video/3D generation without finetuning.
arXiv Detail & Related papers (2024-01-29T09:06:15Z) - InstantID: Zero-shot Identity-Preserving Generation in Seconds [21.04236321562671]
We introduce InstantID, a powerful diffusion model-based solution for ID embedding.
Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image.
Our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL.
arXiv Detail & Related papers (2024-01-15T07:50:18Z) - Identity Encoder for Personalized Diffusion [57.1198884486401]
We propose an encoder-based approach for personalization.
We learn an identity encoder which can extract an identity representation from a set of reference images of a subject.
We show that our approach consistently outperforms existing fine-tuning based approach in both image generation and reconstruction.
arXiv Detail & Related papers (2023-04-14T23:32:24Z) - Unified Multi-Modal Latent Diffusion for Joint Subject and Text
Conditional Image Generation [63.061871048769596]
We present a novel Unified Multi-Modal Latent Diffusion (UMM-Diffusion) which takes joint texts and images containing specified subjects as input sequences.
To be more specific, both input texts and images are encoded into one unified multi-modal latent space.
Our method is able to generate high-quality images with complex semantics from both aspects of input texts and images.
arXiv Detail & Related papers (2023-03-16T13:50:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.