Related papers: FLAIR: A Conditional Diffusion Framework with Applications to Face Video Restoration

FLAIR: A Conditional Diffusion Framework with Applications to Face Video Restoration

URL: http://arxiv.org/abs/2311.15445v1
Date: Sun, 26 Nov 2023 22:09:18 GMT
Title: FLAIR: A Conditional Diffusion Framework with Applications to Face Video Restoration
Authors: Zihao Zou and Jiaming Liu and Shirin Shoushtari and Yubo Wang and Weijie Gan and Ulugbek S. Kamilov
Abstract summary: We present a new conditional diffusion framework called FLAIR for face video restoration. FLAIR ensures temporal consistency across frames in a computationally efficient fashion. Our experiments show superiority of FLAIR over the current state-of-the-art (SOTA) for video super-resolution, deblurring, JPEG restoration, and space-time frame on two high-quality face video datasets.
Score: 14.17192434286707
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Face video restoration (FVR) is a challenging but important problem where one seeks to recover a perceptually realistic face videos from a low-quality input. While diffusion probabilistic models (DPMs) have been shown to achieve remarkable performance for face image restoration, they often fail to preserve temporally coherent, high-quality videos, compromising the fidelity of reconstructed faces. We present a new conditional diffusion framework called FLAIR for FVR. FLAIR ensures temporal consistency across frames in a computationally efficient fashion by converting a traditional image DPM into a video DPM. The proposed conversion uses a recurrent video refinement layer and a temporal self-attention at different scales. FLAIR also uses a conditional iterative refinement process to balance the perceptual and distortion quality during inference. This process consists of two key components: a data-consistency module that analytically ensures that the generated video precisely matches its degraded observation and a coarse-to-fine image enhancement module specifically for facial regions. Our extensive experiments show superiority of FLAIR over the current state-of-the-art (SOTA) for video super-resolution, deblurring, JPEG restoration, and space-time frame interpolation on two high-quality face video datasets.

Related papers

Temporal-Consistent Video Restoration with Pre-trained Diffusion Models [51.47188802535954]
Video restoration (VR) aims to recover high-quality videos from degraded ones. Recent zero-shot VR methods using pre-trained diffusion models (DMs) suffer from approximation errors during reverse diffusion and insufficient temporal consistency. We present a novel a Posterior Maximum (MAP) framework that directly parameterizes video frames in the seed space of DMs, eliminating approximation errors.
arXiv Detail & Related papers (2025-03-19T03:41:56Z)
SVFR: A Unified Framework for Generalized Video Face Restoration [86.17060212058452]
Face Restoration (FR) is a crucial area within image and video processing, focusing on reconstructing high-quality portraits from degraded inputs. We propose a novel approach for the Generalized Video Face Restoration task, which integrates video BFR, inpainting, and colorization tasks. This work advances the state-of-the-art in video FR and establishes a new paradigm for generalized video face restoration.
arXiv Detail & Related papers (2025-01-02T12:51:20Z)
Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency [36.939731355462264]
This study proposes a novel and efficient blind video face enhancement method. It restores high-quality videos from their compressed low-quality versions with an effective de-flickering mechanism. Experiments conducted on the VFHQ-Test dataset demonstrate that our method surpasses the current state-of-the-art blind face video restoration and de-flickering methods on both efficiency and effectiveness.
arXiv Detail & Related papers (2024-11-25T15:14:36Z)
Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution [151.1255837803585]
We propose a novel approach, pursuing Spatial Adaptation and Temporal Coherence (SATeCo) for video super-resolution. SATeCo pivots on learning spatial-temporal guidance from low-resolution videos to calibrate both latent-space high-resolution video denoising and pixel-space video reconstruction. Experiments conducted on the REDS4 and Vid4 datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-03-25T17:59:26Z)
Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution [15.197746480157651]
We propose an effective real-world VSR algorithm by leveraging the strength of pre-trained latent diffusion models. We exploit the temporal dynamics in LR videos to guide the diffusion process by optimizing the latent sampling path with a motion-guided loss. The proposed motion-guided latent diffusion based VSR algorithm achieves significantly better perceptual quality than state-of-the-arts on real-world VSR benchmark datasets.
arXiv Detail & Related papers (2023-12-01T14:40:07Z)
Edit Temporal-Consistent Videos with Image Diffusion Model [49.88186997567138]
Large-scale text-to-image (T2I) diffusion models have been extended for text-guided video editing. T achieves state-of-the-art performance in both video temporal consistency and video editing capability.
arXiv Detail & Related papers (2023-08-17T16:40:55Z)
RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images. Existing methods invert video frames individually often leading to undesired inconsistent results over time. We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID) Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z)
Learning Spatiotemporal Frequency-Transformer for Low-Quality Video Super-Resolution [47.5883522564362]
Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos. Existing VSR techniques usually recover HR frames by extracting textures from nearby frames with known degradation processes. We propose a novel Frequency-Transformer (FTVSR) for handling low-quality videos that carry out self-attention in a combined space-time-frequency domain.
arXiv Detail & Related papers (2022-12-27T16:26:15Z)
VRT: A Video Restoration Transformer [126.79589717404863]
Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames. We propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities.
arXiv Detail & Related papers (2022-01-28T17:54:43Z)
Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video Super-Resolution [100.11355888909102]
Space-time video super-resolution aims at generating a high-resolution (HR) slow-motion video from a low-resolution (LR) and low frame rate (LFR) video sequence. We present a one-stage space-time video super-resolution framework, which can directly reconstruct an HR slow-motion video sequence from an input LR and LFR video.
arXiv Detail & Related papers (2021-04-15T17:59:23Z)
Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution [95.26202278535543]
A simple solution is to split it into two sub-tasks: video frame (VFI) and video super-resolution (VSR) temporalsynthesis and spatial super-resolution are intra-related in this task. We propose a one-stage space-time video super-resolution framework, which directly synthesizes an HR slow-motion video from an LFR, LR video.
arXiv Detail & Related papers (2020-02-26T16:59:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.