Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm
- URL: http://arxiv.org/abs/2403.11781v1
- Date: Mon, 18 Mar 2024 13:39:53 GMT
- Title: Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm
- Authors: Yi Wu, Ziqiang Li, Heliang Zheng, Chaoyue Wang, Bin Li,
- Abstract summary: We propose Infinite-ID, an ID-semantics decoupling paradigm for identity-preserved personalization.
We introduce an identity-enhanced training, incorporating an additional image cross-attention module to capture sufficient ID information.
We also introduce a feature interaction mechanism that combines a mixed attention module with an AdaIN-mean operation to seamlessly merge the two streams.
- Score: 31.06269858216316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Drawing on recent advancements in diffusion models for text-to-image generation, identity-preserved personalization has made significant progress in accurately capturing specific identities with just a single reference image. However, existing methods primarily integrate reference images within the text embedding space, leading to a complex entanglement of image and text information, which poses challenges for preserving both identity fidelity and semantic consistency. To tackle this challenge, we propose Infinite-ID, an ID-semantics decoupling paradigm for identity-preserved personalization. Specifically, we introduce identity-enhanced training, incorporating an additional image cross-attention module to capture sufficient ID information while deactivating the original text cross-attention module of the diffusion model. This ensures that the image stream faithfully represents the identity provided by the reference image while mitigating interference from textual input. Additionally, we introduce a feature interaction mechanism that combines a mixed attention module with an AdaIN-mean operation to seamlessly merge the two streams. This mechanism not only enhances the fidelity of identity and semantic consistency but also enables convenient control over the styles of the generated images. Extensive experimental results on both raw photo generation and style image generation demonstrate the superior performance of our proposed method.
Related papers
- Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis [7.099258248662009]
Text-to-image (T2I) models have significantly advanced the development of artificial intelligence.
However, existing T2I-based methods often struggle to accurately reproduce the appearance of individuals from a reference image.
We leverage the pre-trained UNet from Stable Diffusion to incorporate the target face image directly into the generation process.
arXiv Detail & Related papers (2024-09-27T19:31:04Z) - EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance [20.430259028981094]
Zero-shot subject-driven image generation aims to produce images that incorporate a subject from a given example image.
The challenge lies in preserving the subject's identity while aligning with the text prompt which often requires modifying certain aspects of the subject's appearance.
Key findings include: (1) the design of the subject image encoder significantly impacts identity preservation quality, and (2) separating text and subject guidance is crucial for both text alignment and identity preservation.
arXiv Detail & Related papers (2024-09-12T14:44:45Z) - ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising.
We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z) - Beyond Inserting: Learning Identity Embedding for Semantic-Fidelity Personalized Diffusion Generation [21.739328335601716]
This paper focuses on inserting accurate and interactive ID embedding into the Stable Diffusion Model for personalized generation.
We propose a face-wise attention loss to fit the face region instead of entangling ID-unrelated information, such as face layout and background.
Our results exhibit superior ID accuracy, text-based manipulation ability, and generalization compared to previous methods.
arXiv Detail & Related papers (2024-01-31T11:52:33Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - HFORD: High-Fidelity and Occlusion-Robust De-identification for Face
Privacy Protection [60.63915939982923]
Face de-identification is a practical way to solve the identity protection problem.
The existing facial de-identification methods have revealed several problems.
We present a High-Fidelity and Occlusion-Robust De-identification (HFORD) method to deal with these issues.
arXiv Detail & Related papers (2023-11-15T08:59:02Z) - DisenBooth: Identity-Preserving Disentangled Tuning for Subject-Driven
Text-to-Image Generation [50.39533637201273]
We propose DisenBooth, an identity-preserving disentangled tuning framework for subject-driven text-to-image generation.
By combining the identity-preserved embedding and identity-irrelevant embedding, DisenBooth demonstrates more generation flexibility and controllability.
arXiv Detail & Related papers (2023-05-05T09:08:25Z) - FaceDancer: Pose- and Occlusion-Aware High Fidelity Face Swapping [62.38898610210771]
We present a new single-stage method for subject face swapping and identity transfer, named FaceDancer.
We have two major contributions: Adaptive Feature Fusion Attention (AFFA) and Interpreted Feature Similarity Regularization (IFSR)
arXiv Detail & Related papers (2022-10-19T11:31:38Z) - Learning Disentangled Representation for One-shot Progressive Face
Swapping [65.98684203654908]
We present a simple yet efficient method named FaceSwapper, for one-shot face swapping based on Generative Adversarial Networks.
Our method consists of a disentangled representation module and a semantic-guided fusion module.
Our results show that our method achieves state-of-the-art results on benchmark with fewer training samples.
arXiv Detail & Related papers (2022-03-24T11:19:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.