Audio-Assisted Face Video Restoration with Temporal and Identity Complementary Learning
- URL: http://arxiv.org/abs/2508.04161v1
- Date: Wed, 06 Aug 2025 07:38:27 GMT
- Title: Audio-Assisted Face Video Restoration with Temporal and Identity Complementary Learning
- Authors: Yuqin Cao, Yixuan Gao, Wei Sun, Xiaohong Liu, Yulun Zhang, Xiongkuo Min,
- Abstract summary: We propose a General Audio-assisted face Video restoration Network (GAVN) to address various types of streaming video distortions.<n>GAVN first captures inter-frame temporal features in the low-resolution space to restore frames coarsely and save computational cost.<n>Finally, the reconstruction module integrates temporal features and identity features to generate high-quality face videos.
- Score: 56.62425904247682
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Face videos accompanied by audio have become integral to our daily lives, while they often suffer from complex degradations. Most face video restoration methods neglect the intrinsic correlations between the visual and audio features, especially in mouth regions. A few audio-aided face video restoration methods have been proposed, but they only focus on compression artifact removal. In this paper, we propose a General Audio-assisted face Video restoration Network (GAVN) to address various types of streaming video distortions via identity and temporal complementary learning. Specifically, GAVN first captures inter-frame temporal features in the low-resolution space to restore frames coarsely and save computational cost. Then, GAVN extracts intra-frame identity features in the high-resolution space with the assistance of audio signals and face landmarks to restore more facial details. Finally, the reconstruction module integrates temporal features and identity features to generate high-quality face videos. Experimental results demonstrate that GAVN outperforms the existing state-of-the-art methods on face video compression artifact removal, deblurring, and super-resolution. Codes will be released upon publication.
Related papers
- Show and Polish: Reference-Guided Identity Preservation in Face Video Restoration [9.481604837168762]
Face Video Restoration (FVR) aims to recover high-quality face videos from degraded versions.<n>Traditional methods struggle to preserve fine-grained, identity-specific features when degradation is severe.<n>We introduce IP-FVR, a novel method that leverages a high-quality reference face image as a visual prompt to provide identity conditioning during the denoising process.
arXiv Detail & Related papers (2025-07-14T14:01:37Z) - DicFace: Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration [24.004683996460685]
Video face restoration faces a critical challenge in maintaining temporal consistency while recovering facial details from degraded inputs.<n>This paper presents a novel approach that extends Vector-Quantized Variational Autoencoders (VQ-VAEs), pretrained on static high-quality images, into a video restoration framework.
arXiv Detail & Related papers (2025-06-16T10:54:28Z) - SVFR: A Unified Framework for Generalized Video Face Restoration [86.17060212058452]
Face Restoration (FR) is a crucial area within image and video processing, focusing on reconstructing high-quality portraits from degraded inputs.<n>We propose a novel approach for the Generalized Video Face Restoration task, which integrates video BFR, inpainting, and colorization tasks.<n>This work advances the state-of-the-art in video FR and establishes a new paradigm for generalized video face restoration.
arXiv Detail & Related papers (2025-01-02T12:51:20Z) - Large Motion Video Autoencoding with Cross-modal Video VAE [52.13379965800485]
Video Variational Autoencoder (VAE) is essential for reducing video redundancy and facilitating efficient video generation.<n>Existing Video VAEs have begun to address temporal compression; however, they often suffer from inadequate reconstruction performance.<n>We present a novel and powerful video autoencoder capable of high-fidelity video encoding.
arXiv Detail & Related papers (2024-12-23T18:58:24Z) - Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos [99.42805906884499]
We first introduce a Real-world Low-Quality Face Video benchmark (RFV-LQ) to evaluate leading image-based face restoration algorithms.
We then conduct a thorough systematical analysis of the benefits and challenges associated with extending blind face image restoration algorithms to degraded face videos.
Our analysis identifies several key issues, primarily categorized into two aspects: significant jitters in facial components and noise-shape flickering between frames.
arXiv Detail & Related papers (2024-10-15T17:53:25Z) - Kalman-Inspired Feature Propagation for Video Face Super-Resolution [78.84881180336744]
We introduce a novel framework to maintain a stable face prior to time.
The Kalman filtering principles offer our method a recurrent ability to use the information from previously restored frames to guide and regulate the restoration process of the current frame.
Experiments demonstrate the effectiveness of our method in capturing facial details consistently across video frames.
arXiv Detail & Related papers (2024-08-09T17:57:12Z) - Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer [21.323165895036354]
We propose the first blind video face restoration approach with a novel parsing-guided temporal-coherent transformer (PGTFormer) without pre-alignment.
Specifically, we pre-train a temporal-spatial vector quantized auto-encoder on high-quality video face datasets to extract expressive context-rich priors.
This strategy reduces artifacts and mitigates jitter caused by cumulative errors from face pre-alignment.
arXiv Detail & Related papers (2024-04-21T12:33:07Z) - Identity-Preserving Talking Face Generation with Landmark and Appearance
Priors [106.79923577700345]
Existing person-generic methods have difficulty in generating realistic and lip-synced videos.
We propose a two-stage framework consisting of audio-to-landmark generation and landmark-to-video rendering procedures.
Our method can produce more realistic, lip-synced, and identity-preserving videos than existing person-generic talking face generation methods.
arXiv Detail & Related papers (2023-05-15T01:31:32Z) - Neural Compression-Based Feature Learning for Video Restoration [29.021502115116736]
This paper proposes learning noise-robust feature representations to help video restoration.
We design a neural compression module to filter the noise and keep the most useful information in features for video restoration.
arXiv Detail & Related papers (2022-03-17T09:59:26Z) - Multi-modality Deep Restoration of Extremely Compressed Face Videos [36.83490465562509]
We develop a multi-modality deep convolutional neural network method for restoring face videos that are aggressively compressed.
The main innovation is a new DCNN architecture that incorporates known priors of multiple modalities.
Ample empirical evidences are presented to validate the superior performance of the proposed DCNN method on face videos.
arXiv Detail & Related papers (2021-07-05T16:29:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.