FLAIR: A Conditional Diffusion Framework with Applications to Face Video
Restoration
- URL: http://arxiv.org/abs/2311.15445v1
- Date: Sun, 26 Nov 2023 22:09:18 GMT
- Title: FLAIR: A Conditional Diffusion Framework with Applications to Face Video
Restoration
- Authors: Zihao Zou and Jiaming Liu and Shirin Shoushtari and Yubo Wang and
Weijie Gan and Ulugbek S. Kamilov
- Abstract summary: We present a new conditional diffusion framework called FLAIR for face video restoration.
FLAIR ensures temporal consistency across frames in a computationally efficient fashion.
Our experiments show superiority of FLAIR over the current state-of-the-art (SOTA) for video super-resolution, deblurring, JPEG restoration, and space-time frame on two high-quality face video datasets.
- Score: 14.17192434286707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Face video restoration (FVR) is a challenging but important problem where one
seeks to recover a perceptually realistic face videos from a low-quality input.
While diffusion probabilistic models (DPMs) have been shown to achieve
remarkable performance for face image restoration, they often fail to preserve
temporally coherent, high-quality videos, compromising the fidelity of
reconstructed faces. We present a new conditional diffusion framework called
FLAIR for FVR. FLAIR ensures temporal consistency across frames in a
computationally efficient fashion by converting a traditional image DPM into a
video DPM. The proposed conversion uses a recurrent video refinement layer and
a temporal self-attention at different scales. FLAIR also uses a conditional
iterative refinement process to balance the perceptual and distortion quality
during inference. This process consists of two key components: a
data-consistency module that analytically ensures that the generated video
precisely matches its degraded observation and a coarse-to-fine image
enhancement module specifically for facial regions. Our extensive experiments
show superiority of FLAIR over the current state-of-the-art (SOTA) for video
super-resolution, deblurring, JPEG restoration, and space-time frame
interpolation on two high-quality face video datasets.
Related papers
- Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution [151.1255837803585]
We propose a novel approach, pursuing Spatial Adaptation and Temporal Coherence (SATeCo) for video super-resolution.
SATeCo pivots on learning spatial-temporal guidance from low-resolution videos to calibrate both latent-space high-resolution video denoising and pixel-space video reconstruction.
Experiments conducted on the REDS4 and Vid4 datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-03-25T17:59:26Z) - Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution [15.197746480157651]
We propose an effective real-world VSR algorithm by leveraging the strength of pre-trained latent diffusion models.
We exploit the temporal dynamics in LR videos to guide the diffusion process by optimizing the latent sampling path with a motion-guided loss.
The proposed motion-guided latent diffusion based VSR algorithm achieves significantly better perceptual quality than state-of-the-arts on real-world VSR benchmark datasets.
arXiv Detail & Related papers (2023-12-01T14:40:07Z) - Edit Temporal-Consistent Videos with Image Diffusion Model [49.88186997567138]
Large-scale text-to-image (T2I) diffusion models have been extended for text-guided video editing.
T achieves state-of-the-art performance in both video temporal consistency and video editing capability.
arXiv Detail & Related papers (2023-08-17T16:40:55Z) - RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images.
Existing methods invert video frames individually often leading to undesired inconsistent results over time.
We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID)
Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z) - Learning Spatiotemporal Frequency-Transformer for Low-Quality Video
Super-Resolution [47.5883522564362]
Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos.
Existing VSR techniques usually recover HR frames by extracting textures from nearby frames with known degradation processes.
We propose a novel Frequency-Transformer (FTVSR) for handling low-quality videos that carry out self-attention in a combined space-time-frequency domain.
arXiv Detail & Related papers (2022-12-27T16:26:15Z) - VRT: A Video Restoration Transformer [126.79589717404863]
Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames.
We propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities.
arXiv Detail & Related papers (2022-01-28T17:54:43Z) - Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video
Super-Resolution [100.11355888909102]
Space-time video super-resolution aims at generating a high-resolution (HR) slow-motion video from a low-resolution (LR) and low frame rate (LFR) video sequence.
We present a one-stage space-time video super-resolution framework, which can directly reconstruct an HR slow-motion video sequence from an input LR and LFR video.
arXiv Detail & Related papers (2021-04-15T17:59:23Z) - C2F-FWN: Coarse-to-Fine Flow Warping Network for Spatial-Temporal
Consistent Motion Transfer [5.220611885921671]
We propose Coarse-to-Fine Flow Warping Network (C2F-FWN) for spatial-temporal consistent HVMT.
C2F-FWN employs Flow Temporal Consistency (FTC) Loss to enhance temporal consistency.
Our approach outperforms state-of-art HVMT methods in terms of both spatial and temporal consistency.
arXiv Detail & Related papers (2020-12-16T14:11:13Z) - Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video
Super-Resolution [95.26202278535543]
A simple solution is to split it into two sub-tasks: video frame (VFI) and video super-resolution (VSR)
temporalsynthesis and spatial super-resolution are intra-related in this task.
We propose a one-stage space-time video super-resolution framework, which directly synthesizes an HR slow-motion video from an LFR, LR video.
arXiv Detail & Related papers (2020-02-26T16:59:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.