PersonaMagic: Stage-Regulated High-Fidelity Face Customization with Tandem Equilibrium
- URL: http://arxiv.org/abs/2412.15674v1
- Date: Fri, 20 Dec 2024 08:41:25 GMT
- Title: PersonaMagic: Stage-Regulated High-Fidelity Face Customization with Tandem Equilibrium
- Authors: Xinzhe Li, Jiahui Zhan, Shengfeng He, Yangyang Xu, Junyu Dong, Huaidong Zhang, Yong Du,
- Abstract summary: PersonaMagic is a stage-regulated generative technique designed for high-fidelity face customization.
Our method learns a series of embeddings within a specific timestep interval to capture face concepts.
Tests confirm the superiority of PersonaMagic over state-of-the-art methods in both qualitative and quantitative evaluations.
- Score: 55.72249032433108
- License:
- Abstract: Personalized image generation has made significant strides in adapting content to novel concepts. However, a persistent challenge remains: balancing the accurate reconstruction of unseen concepts with the need for editability according to the prompt, especially when dealing with the complex nuances of facial features. In this study, we delve into the temporal dynamics of the text-to-image conditioning process, emphasizing the crucial role of stage partitioning in introducing new concepts. We present PersonaMagic, a stage-regulated generative technique designed for high-fidelity face customization. Using a simple MLP network, our method learns a series of embeddings within a specific timestep interval to capture face concepts. Additionally, we develop a Tandem Equilibrium mechanism that adjusts self-attention responses in the text encoder, balancing text description and identity preservation, improving both areas. Extensive experiments confirm the superiority of PersonaMagic over state-of-the-art methods in both qualitative and quantitative evaluations. Moreover, its robustness and flexibility are validated in non-facial domains, and it can also serve as a valuable plug-in for enhancing the performance of pretrained personalization models.
Related papers
- FaceMe: Robust Blind Face Restoration with Personal Identification [27.295878867436688]
We propose a personalized face restoration method, FaceMe, based on a diffusion model.
Given a single or a few reference images, we use an identity encoder to extract identity-related features, which serve as prompts to guide the diffusion model in restoring high-quality facial images.
Experimental results demonstrate that our FaceMe can restore high-quality facial images while maintaining identity consistency, achieving excellent performance and robustness.
arXiv Detail & Related papers (2025-01-09T11:52:54Z) - Foundation Cures Personalization: Recovering Facial Personalized Models' Prompt Consistency [33.35678923549471]
FreeCure is a training-free framework that harnesses the intrinsic knowledge from the foundation models themselves to improve the prompt consistency of personalization models.
We enhance multiple attributes in the outputs of personalization models through a novel noise-blending strategy and an inversion-based process.
arXiv Detail & Related papers (2024-11-22T15:21:38Z) - FaceChain-FACT: Face Adapter with Decoupled Training for Identity-preserved Personalization [24.600720169589334]
adapter-based method obtains the ability to customize and generate portraits by text-to-image training on facial data.
There is often a significant performance decrease in test following ability, controllability, and diversity of generated faces compared to the base model.
We propose the Face Adapter with deCoupled Training (FACT) framework, focusing on both model architecture and training strategy.
arXiv Detail & Related papers (2024-10-16T07:25:24Z) - DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation [34.372331192321944]
We introduce DreamSalon, a noise-guided, staged-editing framework.
It focuses on detailed image manipulations and identity-context preservation.
Experiments demonstrate DreamSalon's ability to efficiently and faithfully edit fine details on human faces.
arXiv Detail & Related papers (2024-03-28T08:47:02Z) - DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation [84.0586749616249]
This paper presents DiffFAE, a one-stage and highly-efficient diffusion-based framework tailored for high-fidelity Facial Appearance Editing.
For high-fidelity query attributes transfer, we adopt Space-sensitive Physical Customization (SPC), which ensures the fidelity and generalization ability.
In order to preserve source attributes, we introduce the Region-responsive Semantic Composition (RSC)
This module is guided to learn decoupled source-regarding features, thereby better preserving the identity and alleviating artifacts from non-facial attributes such as hair, clothes, and background.
arXiv Detail & Related papers (2024-03-26T12:53:10Z) - Continual Diffusion with STAMINA: STack-And-Mask INcremental Adapters [67.28751868277611]
Recent work has demonstrated ability to customize text-to-image diffusion models to multiple, fine-grained concepts in a sequential manner.
We show that capacity to learn new tasks reaches saturation over longer sequences.
We introduce a novel method, STack-And-Mask INcremental Adapters (STAMINA), which is composed of low-ranked attention-masked adapters and customized tokens.
arXiv Detail & Related papers (2023-11-30T18:04:21Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion
Models [19.519789922033034]
PhotoVerse is an innovative methodology that incorporates a dual-branch conditioning mechanism in both text and image domains.
After a single training phase, our approach enables generating high-quality images within only a few seconds.
arXiv Detail & Related papers (2023-09-11T19:59:43Z) - DreamIdentity: Improved Editability for Efficient Face-identity
Preserved Image Generation [69.16517915592063]
We propose a novel face-identity encoder to learn an accurate representation of human faces.
We also propose self-augmented editability learning to enhance the editability of models.
Our methods can generate identity-preserved images under different scenes at a much faster speed.
arXiv Detail & Related papers (2023-07-01T11:01:17Z) - MetaPortrait: Identity-Preserving Talking Head Generation with Fast
Personalized Adaptation [57.060828009199646]
We propose an ID-preserving talking head generation framework.
We claim that dense landmarks are crucial to achieving accurate geometry-aware flow fields.
We adaptively fuse the source identity during synthesis, so that the network better preserves the key characteristics of the image portrait.
arXiv Detail & Related papers (2022-12-15T18:59:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.