Related papers: PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

URL: http://arxiv.org/abs/2312.04461v1
Date: Thu, 7 Dec 2023 17:32:29 GMT
Title: PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Authors: Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying Shan
Abstract summary: PhotoMaker is an efficient personalized text-to-image generation method. It encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information.
Score: 102.07914175196817
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts. However, existing personalized generation methods cannot simultaneously satisfy the requirements of high efficiency, promising identity (ID) fidelity, and flexible text controllability. In this work, we introduce PhotoMaker, an efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information. Such an embedding, serving as a unified ID representation, can not only encapsulate the characteristics of the same input ID comprehensively, but also accommodate the characteristics of different IDs for subsequent integration. This paves the way for more intriguing and practically valuable applications. Besides, to drive the training of our PhotoMaker, we propose an ID-oriented data construction pipeline to assemble the training data. Under the nourishment of the dataset constructed through the proposed pipeline, our PhotoMaker demonstrates better ID preservation ability than test-time fine-tuning based methods, yet provides significant speed improvements, high-quality generation results, strong generalization capabilities, and a wide range of applications. Our project page is available at https://photo-maker.github.io/

Related papers

EditID: Training-Free Editable ID Customization for Text-to-Image Generation [12.168520751389622]
We propose EditID, a training-free approach based on the DiT architecture, which achieves highly editable customized IDs for text to image generation. It is challenging to alter facial orientation, character attributes, and other features through prompts. EditID is the first text-to-image solution to propose customizable ID editability on the DiT architecture.
arXiv Detail & Related papers (2025-03-16T14:41:30Z)
Omni-ID: Holistic Identity Representation Designed for Generative Tasks [75.29174595706533]
Omni-ID encodes holistic information about an individual's appearance across diverse expressions. It consolidates information from a varied number of unstructured input images into a structured representation. It demonstrates substantial improvements over conventional representations across various generative tasks.
arXiv Detail & Related papers (2024-12-12T19:21:20Z)
ID-Patch: Robust ID Association for Group Photo Personalization [29.38844265790726]
ID-Patch is a novel method that provides robust association between identities and 2D positions. Our approach generates an ID patch and ID embeddings from the same facial features.
arXiv Detail & Related papers (2024-11-20T18:55:28Z)
Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training [51.87027943520492]
We present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities. Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities.
arXiv Detail & Related papers (2024-06-10T06:26:03Z)
Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter [23.690420512911146]
We propose a lightweight Inv-Adapter, which first extracts diffusion-domain representations of ID images utilizing a pre-trained text-to-image model via DDIM image inversion. Benefiting from the high alignment of the extracted ID prompt features and the intermediate features of the text-to-image model, we then embed them efficiently into the base text-to-image model. We conduct extensive experiments to assess ID fidelity, generation loyalty, speed, and training parameters, all of which show that the proposed Inv-Adapter is highly competitive in ID customization generation and model scale.
arXiv Detail & Related papers (2024-06-05T02:59:08Z)
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation [0.0]
"InstantFamily" is an approach that employs a novel cross-attention mechanism and a multimodal embedding stack to achieve zero-shot multi-ID image generation. Our method effectively preserves ID as it utilizes global and local features from a pre-trained face recognition model integrated with text conditions.
arXiv Detail & Related papers (2024-04-30T10:16:21Z)
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising. We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z)
StableIdentity: Inserting Anybody into Anywhere at First Sight [57.99693188913382]
We propose StableIdentity, which allows identity-consistent recontextualization with just one face image. We are the first to directly inject the identity learned from a single image into video/3D generation without finetuning.
arXiv Detail & Related papers (2024-01-29T09:06:15Z)
InstantID: Zero-shot Identity-Preserving Generation in Seconds [21.04236321562671]
We introduce InstantID, a powerful diffusion model-based solution for ID embedding. Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image. Our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL.
arXiv Detail & Related papers (2024-01-15T07:50:18Z)
Identity Encoder for Personalized Diffusion [57.1198884486401]
We propose an encoder-based approach for personalization. We learn an identity encoder which can extract an identity representation from a set of reference images of a subject. We show that our approach consistently outperforms existing fine-tuning based approach in both image generation and reconstruction.
arXiv Detail & Related papers (2023-04-14T23:32:24Z)
Unified Multi-Modal Latent Diffusion for Joint Subject and Text Conditional Image Generation [63.061871048769596]
We present a novel Unified Multi-Modal Latent Diffusion (UMM-Diffusion) which takes joint texts and images containing specified subjects as input sequences. To be more specific, both input texts and images are encoded into one unified multi-modal latent space. Our method is able to generate high-quality images with complex semantics from both aspects of input texts and images.
arXiv Detail & Related papers (2023-03-16T13:50:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.