HiFi-Portrait: Zero-shot Identity-preserved Portrait Generation with High-fidelity Multi-face Fusion
- URL: http://arxiv.org/abs/2512.14542v1
- Date: Tue, 16 Dec 2025 16:17:46 GMT
- Title: HiFi-Portrait: Zero-shot Identity-preserved Portrait Generation with High-fidelity Multi-face Fusion
- Authors: Yifang Xu, Benxiang Zhai, Yunzhuo Sun, Ming Li, Yang Li, Sidan Du,
- Abstract summary: HiFi-Portrait is a high-fidelity method for zero-shot portrait generation.<n>Our method surpasses the SOTA approaches in face similarity and controllability.
- Score: 12.382436378979564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in diffusion-based technologies have made significant strides, particularly in identity-preserved portrait generation (IPG). However, when using multiple reference images from the same ID, existing methods typically produce lower-fidelity portraits and struggle to customize face attributes precisely. To address these issues, this paper presents HiFi-Portrait, a high-fidelity method for zero-shot portrait generation. Specifically, we first introduce the face refiner and landmark generator to obtain fine-grained multi-face features and 3D-aware face landmarks. The landmarks include the reference ID and the target attributes. Then, we design HiFi-Net to fuse multi-face features and align them with landmarks, which improves ID fidelity and face control. In addition, we devise an automated pipeline to construct an ID-based dataset for training HiFi-Portrait. Extensive experimental results demonstrate that our method surpasses the SOTA approaches in face similarity and controllability. Furthermore, our method is also compatible with previous SDXL-based works.
Related papers
- Diff-PC: Identity-preserving and 3D-aware Controllable Diffusion for Zero-shot Portrait Customization [13.128154695283477]
Diff-PC is a diffusion-based framework for zero-shot portrait customization (PC)<n>It generates realistic portraits with high ID fidelity, specified facial attributes, and diverse backgrounds.<n>Our approach employs the 3D face predictor to reconstruct the 3D-aware facial priors.
arXiv Detail & Related papers (2026-01-31T10:15:41Z) - FaceSnap: Enhanced ID-fidelity Network for Tuning-free Portrait Customization [10.500766709949602]
FaceSnap is a novel method that requires only a single reference image to produce consistent results in a single inference stage.<n>New Facial Attribute Mixer can extract comprehensive fused information from both low-level specific features and high-level abstract features.<n>Landmark Predictor maintains reference identity across landmarks with different poses, providing diverse yet detailed spatial control conditions.
arXiv Detail & Related papers (2026-01-31T09:48:48Z) - From Large Angles to Consistent Faces: Identity-Preserving Video Generation via Mixture of Facial Experts [69.44297222099175]
We introduce a Mixture of Facial Experts (MoFE) that captures distinct but mutually reinforcing aspects of facial attributes.<n>To mitigate dataset limitations, we have tailored a data processing pipeline centered on two key aspects: Face Constraints and Identity Consistency.<n>We have curated and refined a Large Face Angles (LFA) dataset from existing open-source human video datasets.
arXiv Detail & Related papers (2025-08-13T04:10:16Z) - Personalized Face Super-Resolution with Identity Decoupling and Fitting [50.473357681579664]
In extreme degradation scenarios, critical attributes and ID information are often severely lost in the input image.<n>Existing methods tend to generate hallucinated faces under such conditions, producing restored images lacking authentic ID constraints.<n>We propose a novel FSR method with Identity Decoupling and Fitting (IDFSR) to enhance ID restoration under large scaling factors.
arXiv Detail & Related papers (2025-08-13T02:33:11Z) - IC-Portrait: In-Context Matching for View-Consistent Personalized Portrait [51.18967854258571]
IC-Portrait is a novel framework designed to accurately encode individual identities for personalized portrait generation.<n>Our key insight is that pre-trained diffusion models are fast learners for in-context dense correspondence matching.<n>We show that IC-Portrait consistently outperforms existing state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2025-01-28T18:59:03Z) - Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis [7.099258248662009]
Text-to-image (T2I) models have significantly advanced the development of artificial intelligence.
However, existing T2I-based methods often struggle to accurately reproduce the appearance of individuals from a reference image.
We leverage the pre-trained UNet from Stable Diffusion to incorporate the target face image directly into the generation process.
arXiv Detail & Related papers (2024-09-27T19:31:04Z) - G2Face: High-Fidelity Reversible Face Anonymization via Generative and Geometric Priors [71.69161292330504]
Reversible face anonymization seeks to replace sensitive identity information in facial images with synthesized alternatives.
This paper introduces Gtextsuperscript2Face, which leverages both generative and geometric priors to enhance identity manipulation.
Our method outperforms existing state-of-the-art techniques in face anonymization and recovery, while preserving high data utility.
arXiv Detail & Related papers (2024-08-18T12:36:47Z) - ID-Sculpt: ID-aware 3D Head Generation from Single In-the-wild Portrait Image [57.46195661521239]
Previous text-based methods for generating 3D heads were limited by text descriptions and image-based methods struggled to produce high-quality head geometry.<n>We propose a novel framework, ID-Sculpt, to generate high-quality 3D heads while preserving their identities.<n> Extensive experiments demonstrate that we can generate high-quality 3D heads with accurate geometry and texture from a single in-the-wild portrait image.
arXiv Detail & Related papers (2024-06-24T15:11:35Z) - ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising.
We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z) - Face Swap via Diffusion Model [4.026688121914668]
This report presents a diffusion model based framework for face swapping between two portrait images.
The basic framework consists of three components, for face feature encoding, multi-conditional generation, and face inpainting respectively.
arXiv Detail & Related papers (2024-03-02T07:02:17Z) - InstantID: Zero-shot Identity-Preserving Generation in Seconds [21.04236321562671]
We introduce InstantID, a powerful diffusion model-based solution for ID embedding.
Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image.
Our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL.
arXiv Detail & Related papers (2024-01-15T07:50:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.