FaceSnap: Enhanced ID-fidelity Network for Tuning-free Portrait Customization
- URL: http://arxiv.org/abs/2602.00627v1
- Date: Sat, 31 Jan 2026 09:48:48 GMT
- Title: FaceSnap: Enhanced ID-fidelity Network for Tuning-free Portrait Customization
- Authors: Benxiang Zhai, Yifang Xu, Guofeng Zhang, Yang Li, Sidan Du,
- Abstract summary: FaceSnap is a novel method that requires only a single reference image to produce consistent results in a single inference stage.<n>New Facial Attribute Mixer can extract comprehensive fused information from both low-level specific features and high-level abstract features.<n>Landmark Predictor maintains reference identity across landmarks with different poses, providing diverse yet detailed spatial control conditions.
- Score: 10.500766709949602
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Benefiting from the significant advancements in text-to-image diffusion models, research in personalized image generation, particularly customized portrait generation, has also made great strides recently. However, existing methods either require time-consuming fine-tuning and lack generalizability or fail to achieve high fidelity in facial details. To address these issues, we propose FaceSnap, a novel method based on Stable Diffusion (SD) that requires only a single reference image and produces extremely consistent results in a single inference stage. This method is plug-and-play and can be easily extended to different SD models. Specifically, we design a new Facial Attribute Mixer that can extract comprehensive fused information from both low-level specific features and high-level abstract features, providing better guidance for image generation. We also introduce a Landmark Predictor that maintains reference identity across landmarks with different poses, providing diverse yet detailed spatial control conditions for image generation. Then we use an ID-preserving module to inject these into the UNet. Experimental results demonstrate that our approach performs remarkably in personalized and customized portrait generation, surpassing other state-of-the-art methods in this domain.
Related papers
- HiFi-Portrait: Zero-shot Identity-preserved Portrait Generation with High-fidelity Multi-face Fusion [12.382436378979564]
HiFi-Portrait is a high-fidelity method for zero-shot portrait generation.<n>Our method surpasses the SOTA approaches in face similarity and controllability.
arXiv Detail & Related papers (2025-12-16T16:17:46Z) - Personalized Face Super-Resolution with Identity Decoupling and Fitting [50.473357681579664]
In extreme degradation scenarios, critical attributes and ID information are often severely lost in the input image.<n>Existing methods tend to generate hallucinated faces under such conditions, producing restored images lacking authentic ID constraints.<n>We propose a novel FSR method with Identity Decoupling and Fitting (IDFSR) to enhance ID restoration under large scaling factors.
arXiv Detail & Related papers (2025-08-13T02:33:11Z) - Multi-focal Conditioned Latent Diffusion for Person Image Synthesis [59.113899155476005]
The Latent Diffusion Model (LDM) has demonstrated strong capabilities in high-resolution image generation.<n>We propose a Multi-focal Conditioned Latent Diffusion (MCLD) method to address these limitations.<n>Our approach utilizes a multi-focal condition aggregation module, which effectively integrates facial identity and texture-specific information.
arXiv Detail & Related papers (2025-03-19T20:50:10Z) - HiFiVFS: High Fidelity Video Face Swapping [35.49571526968986]
Face swapping aims to generate results that combine the identity from the source with attributes from the target.<n>We propose a high fidelity video face swapping framework, which leverages the strong generative capability and temporal prior of Stable Video Diffusion.<n>Our method achieves state-of-the-art (SOTA) in video face swapping, both qualitatively and quantitatively.
arXiv Detail & Related papers (2024-11-27T12:30:24Z) - Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis [7.099258248662009]
Text-to-image (T2I) models have significantly advanced the development of artificial intelligence.
However, existing T2I-based methods often struggle to accurately reproduce the appearance of individuals from a reference image.
We leverage the pre-trained UNet from Stable Diffusion to incorporate the target face image directly into the generation process.
arXiv Detail & Related papers (2024-09-27T19:31:04Z) - LCM-Lookahead for Encoder-based Text-to-Image Personalization [82.56471486184252]
We explore the potential of using shortcut-mechanisms to guide the personalization of text-to-image models.
We focus on encoder-based personalization approaches, and demonstrate that by tuning them with a lookahead identity loss, we can achieve higher identity fidelity.
arXiv Detail & Related papers (2024-04-04T17:43:06Z) - InstantID: Zero-shot Identity-Preserving Generation in Seconds [21.04236321562671]
We introduce InstantID, a powerful diffusion model-based solution for ID embedding.
Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image.
Our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL.
arXiv Detail & Related papers (2024-01-15T07:50:18Z) - PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved
Personalization [92.90392834835751]
PortraitBooth is designed for high efficiency, robust identity preservation, and expression-editable text-to-image generation.
PortraitBooth eliminates computational overhead and mitigates identity distortion.
It incorporates emotion-aware cross-attention control for diverse facial expressions in generated images.
arXiv Detail & Related papers (2023-12-11T13:03:29Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - Identity Encoder for Personalized Diffusion [57.1198884486401]
We propose an encoder-based approach for personalization.
We learn an identity encoder which can extract an identity representation from a set of reference images of a subject.
We show that our approach consistently outperforms existing fine-tuning based approach in both image generation and reconstruction.
arXiv Detail & Related papers (2023-04-14T23:32:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.