Related papers: Show and Polish: Reference-Guided Identity Preservation in Face Video Restoration

Show and Polish: Reference-Guided Identity Preservation in Face Video Restoration

URL: http://arxiv.org/abs/2507.10293v1
Date: Mon, 14 Jul 2025 14:01:37 GMT
Title: Show and Polish: Reference-Guided Identity Preservation in Face Video Restoration
Authors: Wenkang Han, Wang Lin, Yiyun Zhou, Qi Liu, Shulei Wang, Chang Yao, Jingyuan Chen,
Abstract summary: Face Video Restoration (FVR) aims to recover high-quality face videos from degraded versions.<n>Traditional methods struggle to preserve fine-grained, identity-specific features when degradation is severe.<n>We introduce IP-FVR, a novel method that leverages a high-quality reference face image as a visual prompt to provide identity conditioning during the denoising process.
Score: 9.481604837168762
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Face Video Restoration (FVR) aims to recover high-quality face videos from degraded versions. Traditional methods struggle to preserve fine-grained, identity-specific features when degradation is severe, often producing average-looking faces that lack individual characteristics. To address these challenges, we introduce IP-FVR, a novel method that leverages a high-quality reference face image as a visual prompt to provide identity conditioning during the denoising process. IP-FVR incorporates semantically rich identity information from the reference image using decoupled cross-attention mechanisms, ensuring detailed and identity consistent results. For intra-clip identity drift (within 24 frames), we introduce an identity-preserving feedback learning method that combines cosine similarity-based reward signals with suffix-weighted temporal aggregation. This approach effectively minimizes drift within sequences of frames. For inter-clip identity drift, we develop an exponential blending strategy that aligns identities across clips by iteratively blending frames from previous clips during the denoising process. This method ensures consistent identity representation across different clips. Additionally, we enhance the restoration process with a multi-stream negative prompt, guiding the model's attention to relevant facial attributes and minimizing the generation of low-quality or incorrect features. Extensive experiments on both synthetic and real-world datasets demonstrate that IP-FVR outperforms existing methods in both quality and identity preservation, showcasing its substantial potential for practical applications in face video restoration.

Related papers

Audio-Assisted Face Video Restoration with Temporal and Identity Complementary Learning [56.62425904247682]
We propose a General Audio-assisted face Video restoration Network (GAVN) to address various types of streaming video distortions.<n>GAVN first captures inter-frame temporal features in the low-resolution space to restore frames coarsely and save computational cost.<n>Finally, the reconstruction module integrates temporal features and identity features to generate high-quality face videos.
arXiv Detail & Related papers (2025-08-06T07:38:27Z)
Robust ID-Specific Face Restoration via Alignment Learning [18.869593414569206]
We present Robust ID-Specific Face Restoration (RIDFR), a novel ID-specific face restoration framework based on diffusion models.<n>RIDFR incorporates Alignment Learning, which aligns the restoration results from multiple references with the same identity in order to suppress the interference of ID-irrelevant face semantics.<n>Experiments demonstrate that our framework outperforms the state-of-the-art methods, reconstructing high-quality ID-specific results with high identity fidelity and demonstrating strong robustness.
arXiv Detail & Related papers (2025-07-15T03:16:12Z)
Proteus-ID: ID-Consistent and Motion-Coherent Video Customization [17.792780924370103]
Video identity customization seeks to synthesize realistic, temporally coherent videos of a specific subject, given a single reference image and a text prompt.<n>This task presents two core challenges: maintaining identity consistency while aligning with the described appearance and actions, and generating natural, fluid motion without unrealistic stiffness.<n>We introduce Proteus-ID, a novel diffusion-based framework for identity-consistent and motion-coherent video customization.
arXiv Detail & Related papers (2025-06-30T11:05:32Z)
DicFace: Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration [24.004683996460685]
Video face restoration faces a critical challenge in maintaining temporal consistency while recovering facial details from degraded inputs.<n>This paper presents a novel approach that extends Vector-Quantized Variational Autoencoders (VQ-VAEs), pretrained on static high-quality images, into a video restoration framework.
arXiv Detail & Related papers (2025-06-16T10:54:28Z)
Reference-Guided Identity Preserving Face Restoration [54.10295747851343]
Preserving face identity is a critical yet persistent challenge in diffusion-based image restoration.<n>This paper introduces a novel approach that maximizes reference face utility for improved face restoration and identity preservation.
arXiv Detail & Related papers (2025-05-28T02:46:34Z)
Removing Averaging: Personalized Lip-Sync Driven Characters Based on Identity Adapter [10.608872317957026]
"lip averaging" phenomenon occurs when a model fails to preserve subtle facial details when dubbing unseen in-the-wild videos.<n>We propose UnAvgLip, which extracts identity embeddings from reference videos to generate highly faithful facial sequences.
arXiv Detail & Related papers (2025-03-09T02:36:31Z)
EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion [3.592206475366951]
Existing methods struggle with "copy-paste" artifacts and low similarity issues.<n>We propose EchoVideo, which integrates high-level semantic features from text to capture clean facial identity representations.<n>It achieves excellent results in generating high-quality, controllability and fidelity videos.
arXiv Detail & Related papers (2025-01-23T08:06:11Z)
OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration.<n>We propose OSDFace, a novel one-step diffusion model for face restoration.<n>Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z)
Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos [99.42805906884499]
We first introduce a Real-world Low-Quality Face Video benchmark (RFV-LQ) to evaluate leading image-based face restoration algorithms. We then conduct a thorough systematical analysis of the benefits and challenges associated with extending blind face image restoration algorithms to degraded face videos. Our analysis identifies several key issues, primarily categorized into two aspects: significant jitters in facial components and noise-shape flickering between frames.
arXiv Detail & Related papers (2024-10-15T17:53:25Z)
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising. We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z)
CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using Score-Based Diffusion Models [57.9771859175664]
Recent generative-prior-based methods have shown promising blind face restoration performance. Generating fine-grained facial details faithful to inputs remains a challenging problem. We introduce a diffusion-based-prior inside a VQGAN architecture that focuses on learning the distribution over uncorrupted latent embeddings.
arXiv Detail & Related papers (2024-02-08T23:51:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.