Related papers: Discrete Prior-based Temporal-coherent Content Prediction for Blind Face Video Restoration

Discrete Prior-based Temporal-coherent Content Prediction for Blind Face Video Restoration

URL: http://arxiv.org/abs/2501.09960v1
Date: Fri, 17 Jan 2025 05:23:26 GMT
Title: Discrete Prior-based Temporal-coherent Content Prediction for Blind Face Video Restoration
Authors: Lianxin Xie, Bingbing Zheng, Wen Xue, Yunfei Zhang, Le Jiang, Ruotao Xu, Si Wu, Hau-San Wong,
Abstract summary: Blind face video restoration aims to restore high-fidelity details from videos subjected to complex and unknown degradations.<n>This paper introduces a Discrete Prior-based Temporal-Coherent content prediction transformer to address the challenge.
Score: 18.808917370860208
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Blind face video restoration aims to restore high-fidelity details from videos subjected to complex and unknown degradations. This task poses a significant challenge of managing temporal heterogeneity while at the same time maintaining stable face attributes. In this paper, we introduce a Discrete Prior-based Temporal-Coherent content prediction transformer to address the challenge, and our model is referred to as DP-TempCoh. Specifically, we incorporate a spatial-temporal-aware content prediction module to synthesize high-quality content from discrete visual priors, conditioned on degraded video tokens. To further enhance the temporal coherence of the predicted content, a motion statistics modulation module is designed to adjust the content, based on discrete motion priors in terms of cross-frame mean and variance. As a result, the statistics of the predicted content can match with that of real videos over time. By performing extensive experiments, we verify the effectiveness of the design elements and demonstrate the superior performance of our DP-TempCoh in both synthetically and naturally degraded video restoration.

Related papers

Low-Cost Test-Time Adaptation for Robust Video Editing [4.707015344498921]
Video editing is a critical component of content creation that transforms raw footage into coherent works aligned with specific visual and narrative objectives.<n>Existing approaches face two major challenges: temporal inconsistencies due to failure in capturing complex motion patterns, and overfitting to simple prompts arising from limitations in UNet backbone architectures.<n>We present Vid-TTA, a lightweight test-time adaptation framework that personalizes optimization for each test video during inference through self-supervised auxiliary tasks.
arXiv Detail & Related papers (2025-07-29T14:31:17Z)
DicFace: Dirichlet-Constrained Variational Codebook Learning for Temporally Coherent Video Face Restoration [24.004683996460685]
Video face restoration faces a critical challenge in maintaining temporal consistency while recovering facial details from degraded inputs.<n>This paper presents a novel approach that extends Vector-Quantized Variational Autoencoders (VQ-VAEs), pretrained on static high-quality images, into a video restoration framework.
arXiv Detail & Related papers (2025-06-16T10:54:28Z)
Temporal Inconsistency Guidance for Super-resolution Video Quality Assessment [63.811519474030234]
We propose a perception-oriented approach to quantify frame-wise temporal inconsistency.<n>Inspired by the human visual system, we develop an Inconsistency Guided Temporal Module.<n>Our method significantly outperforms state-of-the-art VQA approaches.
arXiv Detail & Related papers (2024-12-25T15:43:41Z)
Temporal Contrastive Learning for Video Temporal Reasoning in Large Vision-Language Models [44.99833362998488]
Temporal Semantic Alignment via Dynamic Prompting (TSADP) is a novel framework that enhances temporal reasoning capabilities.<n>We evaluate TSADP on the VidSitu dataset, augmented with enriched temporal annotations.<n>Our analysis highlights the robustness, efficiency, and practical utility of TSADP, making it a step forward in the field of video-language understanding.
arXiv Detail & Related papers (2024-12-16T02:37:58Z)
Object-Centric Temporal Consistency via Conditional Autoregressive Inductive Biases [69.46487306858789]
Conditional Autoregressive Slot Attention (CA-SA) is a framework that enhances the temporal consistency of extracted object-centric representations in video-centric vision tasks. We present qualitative and quantitative results showing that our proposed method outperforms the considered baselines on downstream tasks.
arXiv Detail & Related papers (2024-10-21T07:44:44Z)
Edit Temporal-Consistent Videos with Image Diffusion Model [49.88186997567138]
Large-scale text-to-image (T2I) diffusion models have been extended for text-guided video editing. T achieves state-of-the-art performance in both video temporal consistency and video editing capability.
arXiv Detail & Related papers (2023-08-17T16:40:55Z)
DisCoVQA: Temporal Distortion-Content Transformers for Video Quality Assessment [56.42140467085586]
Some temporal variations are causing temporal distortions and lead to extra quality degradations. Human visual system often has different attention to frames with different contents. We propose a novel and effective transformer-based VQA method to tackle these two issues.
arXiv Detail & Related papers (2022-06-20T15:31:27Z)
STIP: A SpatioTemporal Information-Preserving and Perception-Augmented Model for High-Resolution Video Prediction [78.129039340528]
We propose a Stemporal Information-Preserving and Perception-Augmented Model (STIP) to solve the above two problems. The proposed model aims to preserve thetemporal information for videos during the feature extraction and the state transitions. Experimental results show that the proposed STIP can predict videos with more satisfactory visual quality compared with a variety of state-of-the-art methods.
arXiv Detail & Related papers (2022-06-09T09:49:04Z)
Video Demoireing with Relation-Based Temporal Consistency [68.20281109859998]
Moire patterns, appearing as color distortions, severely degrade image and video qualities when filming a screen with digital cameras. We study how to remove such undesirable moire patterns in videos, namely video demoireing.
arXiv Detail & Related papers (2022-04-06T17:45:38Z)
Intrinsic Temporal Regularization for High-resolution Human Video Synthesis [59.54483950973432]
temporal consistency is crucial for extending image processing pipelines to the video domain. We propose an effective intrinsic temporal regularization scheme, where an intrinsic confidence map is estimated via the frame generator to regulate motion estimation. We apply our intrinsic temporal regulation to single-image generator, leading to a powerful " INTERnet" capable of generating $512times512$ resolution human action videos.
arXiv Detail & Related papers (2020-12-11T05:29:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.