Related papers: DicFace: Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration

DicFace: Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration

URL: http://arxiv.org/abs/2506.13355v1
Date: Mon, 16 Jun 2025 10:54:28 GMT
Title: DicFace: Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration
Authors: Yan Chen, Hanlin Shang, Ce Liu, Yuxuan Chen, Hui Li, Weihao Yuan, Hao Zhu, Zilong Dong, Siyu Zhu,
Abstract summary: Video face restoration faces a critical challenge in maintaining temporal consistency while recovering facial details from degraded inputs.<n>This paper presents a novel approach that extends Vector-Quantized Variational Autoencoders (VQ-VAEs), pretrained on static high-quality images, into a video restoration framework.
Score: 24.004683996460685
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video face restoration faces a critical challenge in maintaining temporal consistency while recovering fine facial details from degraded inputs. This paper presents a novel approach that extends Vector-Quantized Variational Autoencoders (VQ-VAEs), pretrained on static high-quality portraits, into a video restoration framework through variational latent space modeling. Our key innovation lies in reformulating discrete codebook representations as Dirichlet-distributed continuous variables, enabling probabilistic transitions between facial features across frames. A spatio-temporal Transformer architecture jointly models inter-frame dependencies and predicts latent distributions, while a Laplacian-constrained reconstruction loss combined with perceptual (LPIPS) regularization enhances both pixel accuracy and visual quality. Comprehensive evaluations on blind face restoration, video inpainting, and facial colorization tasks demonstrate state-of-the-art performance. This work establishes an effective paradigm for adapting intensive image priors, pretrained on high-quality images, to video restoration while addressing the critical challenge of flicker artifacts. The source code has been open-sourced and is available at https://github.com/fudan-generative-vision/DicFace.

Related papers

Show and Polish: Reference-Guided Identity Preservation in Face Video Restoration [9.481604837168762]
Face Video Restoration (FVR) aims to recover high-quality face videos from degraded versions.<n>Traditional methods struggle to preserve fine-grained, identity-specific features when degradation is severe.<n>We introduce IP-FVR, a novel method that leverages a high-quality reference face image as a visual prompt to provide identity conditioning during the denoising process.
arXiv Detail & Related papers (2025-07-14T14:01:37Z)
SVFR: A Unified Framework for Generalized Video Face Restoration [86.17060212058452]
Face Restoration (FR) is a crucial area within image and video processing, focusing on reconstructing high-quality portraits from degraded inputs.<n>We propose a novel approach for the Generalized Video Face Restoration task, which integrates video BFR, inpainting, and colorization tasks.<n>This work advances the state-of-the-art in video FR and establishes a new paradigm for generalized video face restoration.
arXiv Detail & Related papers (2025-01-02T12:51:20Z)
OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration.<n>We propose OSDFace, a novel one-step diffusion model for face restoration.<n>Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z)
Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency [36.939731355462264]
This study proposes a novel and efficient blind video face enhancement method. It restores high-quality videos from their compressed low-quality versions with an effective de-flickering mechanism. Experiments conducted on the VFHQ-Test dataset demonstrate that our method surpasses the current state-of-the-art blind face video restoration and de-flickering methods on both efficiency and effectiveness.
arXiv Detail & Related papers (2024-11-25T15:14:36Z)
Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos [99.42805906884499]
We first introduce a Real-world Low-Quality Face Video benchmark (RFV-LQ) to evaluate leading image-based face restoration algorithms. We then conduct a thorough systematical analysis of the benefits and challenges associated with extending blind face image restoration algorithms to degraded face videos. Our analysis identifies several key issues, primarily categorized into two aspects: significant jitters in facial components and noise-shape flickering between frames.
arXiv Detail & Related papers (2024-10-15T17:53:25Z)
Kalman-Inspired Feature Propagation for Video Face Super-Resolution [78.84881180336744]
We introduce a novel framework to maintain a stable face prior to time. The Kalman filtering principles offer our method a recurrent ability to use the information from previously restored frames to guide and regulate the restoration process of the current frame. Experiments demonstrate the effectiveness of our method in capturing facial details consistently across video frames.
arXiv Detail & Related papers (2024-08-09T17:57:12Z)
Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer [21.323165895036354]
We propose the first blind video face restoration approach with a novel parsing-guided temporal-coherent transformer (PGTFormer) without pre-alignment. Specifically, we pre-train a temporal-spatial vector quantized auto-encoder on high-quality video face datasets to extract expressive context-rich priors. This strategy reduces artifacts and mitigates jitter caused by cumulative errors from face pre-alignment.
arXiv Detail & Related papers (2024-04-21T12:33:07Z)
CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using Score-Based Diffusion Models [57.9771859175664]
Recent generative-prior-based methods have shown promising blind face restoration performance. Generating fine-grained facial details faithful to inputs remains a challenging problem. We introduce a diffusion-based-prior inside a VQGAN architecture that focuses on learning the distribution over uncorrupted latent embeddings.
arXiv Detail & Related papers (2024-02-08T23:51:49Z)
Towards Robust Blind Face Restoration with Codebook Lookup Transformer [94.48731935629066]
Blind face restoration is a highly ill-posed problem that often requires auxiliary guidance. We show that a learned discrete codebook prior in a small proxy space cast blind face restoration as a code prediction task. We propose a Transformer-based prediction network, named CodeFormer, to model global composition and context of the low-quality faces.
arXiv Detail & Related papers (2022-06-22T17:58:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.