Related papers: FaceSnap: Enhanced ID-fidelity Network for Tuning-free Portrait Customization

FaceSnap: Enhanced ID-fidelity Network for Tuning-free Portrait Customization

URL: http://arxiv.org/abs/2602.00627v1
Date: Sat, 31 Jan 2026 09:48:48 GMT
Title: FaceSnap: Enhanced ID-fidelity Network for Tuning-free Portrait Customization
Authors: Benxiang Zhai, Yifang Xu, Guofeng Zhang, Yang Li, Sidan Du,
Abstract summary: FaceSnap is a novel method that requires only a single reference image to produce consistent results in a single inference stage.<n>New Facial Attribute Mixer can extract comprehensive fused information from both low-level specific features and high-level abstract features.<n>Landmark Predictor maintains reference identity across landmarks with different poses, providing diverse yet detailed spatial control conditions.
Score: 10.500766709949602
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Benefiting from the significant advancements in text-to-image diffusion models, research in personalized image generation, particularly customized portrait generation, has also made great strides recently. However, existing methods either require time-consuming fine-tuning and lack generalizability or fail to achieve high fidelity in facial details. To address these issues, we propose FaceSnap, a novel method based on Stable Diffusion (SD) that requires only a single reference image and produces extremely consistent results in a single inference stage. This method is plug-and-play and can be easily extended to different SD models. Specifically, we design a new Facial Attribute Mixer that can extract comprehensive fused information from both low-level specific features and high-level abstract features, providing better guidance for image generation. We also introduce a Landmark Predictor that maintains reference identity across landmarks with different poses, providing diverse yet detailed spatial control conditions for image generation. Then we use an ID-preserving module to inject these into the UNet. Experimental results demonstrate that our approach performs remarkably in personalized and customized portrait generation, surpassing other state-of-the-art methods in this domain.

Related papers

HiFi-Portrait: Zero-shot Identity-preserved Portrait Generation with High-fidelity Multi-face Fusion [12.382436378979564]
HiFi-Portrait is a high-fidelity method for zero-shot portrait generation.<n>Our method surpasses the SOTA approaches in face similarity and controllability.
arXiv Detail & Related papers (2025-12-16T16:17:46Z)
Personalized Face Super-Resolution with Identity Decoupling and Fitting [50.473357681579664]
In extreme degradation scenarios, critical attributes and ID information are often severely lost in the input image.<n>Existing methods tend to generate hallucinated faces under such conditions, producing restored images lacking authentic ID constraints.<n>We propose a novel FSR method with Identity Decoupling and Fitting (IDFSR) to enhance ID restoration under large scaling factors.
arXiv Detail & Related papers (2025-08-13T02:33:11Z)
Multi-focal Conditioned Latent Diffusion for Person Image Synthesis [59.113899155476005]
The Latent Diffusion Model (LDM) has demonstrated strong capabilities in high-resolution image generation.<n>We propose a Multi-focal Conditioned Latent Diffusion (MCLD) method to address these limitations.<n>Our approach utilizes a multi-focal condition aggregation module, which effectively integrates facial identity and texture-specific information.
arXiv Detail & Related papers (2025-03-19T20:50:10Z)
HiFiVFS: High Fidelity Video Face Swapping [35.49571526968986]
Face swapping aims to generate results that combine the identity from the source with attributes from the target.<n>We propose a high fidelity video face swapping framework, which leverages the strong generative capability and temporal prior of Stable Video Diffusion.<n>Our method achieves state-of-the-art (SOTA) in video face swapping, both qualitatively and quantitatively.
arXiv Detail & Related papers (2024-11-27T12:30:24Z)
Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis [7.099258248662009]
Text-to-image (T2I) models have significantly advanced the development of artificial intelligence. However, existing T2I-based methods often struggle to accurately reproduce the appearance of individuals from a reference image. We leverage the pre-trained UNet from Stable Diffusion to incorporate the target face image directly into the generation process.
arXiv Detail & Related papers (2024-09-27T19:31:04Z)
LCM-Lookahead for Encoder-based Text-to-Image Personalization [82.56471486184252]
We explore the potential of using shortcut-mechanisms to guide the personalization of text-to-image models. We focus on encoder-based personalization approaches, and demonstrate that by tuning them with a lookahead identity loss, we can achieve higher identity fidelity.
arXiv Detail & Related papers (2024-04-04T17:43:06Z)
InstantID: Zero-shot Identity-Preserving Generation in Seconds [21.04236321562671]
We introduce InstantID, a powerful diffusion model-based solution for ID embedding. Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image. Our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL.
arXiv Detail & Related papers (2024-01-15T07:50:18Z)
PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization [92.90392834835751]
PortraitBooth is designed for high efficiency, robust identity preservation, and expression-editable text-to-image generation. PortraitBooth eliminates computational overhead and mitigates identity distortion. It incorporates emotion-aware cross-attention control for diverse facial expressions in generated images.
arXiv Detail & Related papers (2023-12-11T13:03:29Z)
When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images. We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models. Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z)
Identity Encoder for Personalized Diffusion [57.1198884486401]
We propose an encoder-based approach for personalization. We learn an identity encoder which can extract an identity representation from a set of reference images of a subject. We show that our approach consistently outperforms existing fine-tuning based approach in both image generation and reconstruction.
arXiv Detail & Related papers (2023-04-14T23:32:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.