Related papers: Toward Accurate and Temporally Consistent Video Restoration from Raw Data

Toward Accurate and Temporally Consistent Video Restoration from Raw Data

URL: http://arxiv.org/abs/2312.16247v1
Date: Mon, 25 Dec 2023 12:38:03 GMT
Title: Toward Accurate and Temporally Consistent Video Restoration from Raw Data
Authors: Shi Guo, Jianqi Ma, Xi Yang, Zhengqiang Zhang, Lei Zhang
Abstract summary: We present a new VJDD framework by consistent and accurate latent space propagation. The proposed losses can circumvent the error accumulation problem caused by inaccurate flow estimation. Experiments demonstrate the leading VJDD performance in term of restoration accuracy, perceptual quality and temporal consistency.
Score: 20.430231283171327
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Denoising and demosaicking are two fundamental steps in reconstructing a clean full-color video from raw data, while performing video denoising and demosaicking jointly, namely VJDD, could lead to better video restoration performance than performing them separately. In addition to restoration accuracy, another key challenge to VJDD lies in the temporal consistency of consecutive frames. This issue exacerbates when perceptual regularization terms are introduced to enhance video perceptual quality. To address these challenges, we present a new VJDD framework by consistent and accurate latent space propagation, which leverages the estimation of previous frames as prior knowledge to ensure consistent recovery of the current frame. A data temporal consistency (DTC) loss and a relational perception consistency (RPC) loss are accordingly designed. Compared with the commonly used flow-based losses, the proposed losses can circumvent the error accumulation problem caused by inaccurate flow estimation and effectively handle intensity changes in videos, improving much the temporal consistency of output videos while preserving texture details. Extensive experiments demonstrate the leading VJDD performance of our method in term of restoration accuracy, perceptual quality and temporal consistency. Codes and dataset are available at \url{https://github.com/GuoShi28/VJDD}.

Related papers

Temporal Saliency-Guided Distillation: A Scalable Framework for Distilling Video Datasets [13.22969334943219]
We propose a novel uni-level video dataset distillation framework.<n>To address temporal redundancy and enhance motion preservation, we introduce a temporal saliency-guided filtering mechanism.<n>Our method achieves state-of-the-art performance, bridging the gap between real and distilled video data.
arXiv Detail & Related papers (2025-05-27T04:02:57Z)
Joint Flow And Feature Refinement Using Attention For Video Restoration [0.3811713174618588]
We propose a novel video restoration framework named Joint Flow and Feature Refinement using Attention (JFFRA)<n>Our method demonstrates a remarkable performance improvement of up to 1.62 dB compared to state-of-the-art approaches.
arXiv Detail & Related papers (2025-05-22T09:18:51Z)
Temporal-Consistent Video Restoration with Pre-trained Diffusion Models [51.47188802535954]
Video restoration (VR) aims to recover high-quality videos from degraded ones. Recent zero-shot VR methods using pre-trained diffusion models (DMs) suffer from approximation errors during reverse diffusion and insufficient temporal consistency. We present a novel a Posterior Maximum (MAP) framework that directly parameterizes video frames in the seed space of DMs, eliminating approximation errors.
arXiv Detail & Related papers (2025-03-19T03:41:56Z)
Rethinking Video Tokenization: A Conditioned Diffusion-based Approach [58.164354605550194]
New tokenizer, Diffusion Conditioned-based Gene Tokenizer, replaces GAN-based decoder with conditional diffusion model. We trained using only a basic MSE diffusion loss for reconstruction, along with KL term and LPIPS perceptual loss from scratch. Even a scaled-down version of CDT (3$times inference speedup) still performs comparably with top baselines.
arXiv Detail & Related papers (2025-03-05T17:59:19Z)
TINQ: Temporal Inconsistency Guided Blind Video Quality Assessment [61.76431477117295]
Blind video quality assessment (BVQA) has been actively researched for user-generated content (UGC) videos. Recent super-resolution (SR) techniques have been widely applied in videos. Temporal inconsistency, referring to irregularities between consecutive frames, is relevant to video quality.
arXiv Detail & Related papers (2024-12-25T15:43:41Z)
Uncertainty-Aware Deep Video Compression with Ensembles [24.245365441718654]
We propose an uncertainty-aware video compression model that can effectively capture predictive uncertainty with deep ensembles. Our model can effectively save bits by more than 20% compared to 1080p sequences.
arXiv Detail & Related papers (2024-03-28T05:44:48Z)
Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution [15.197746480157651]
We propose an effective real-world VSR algorithm by leveraging the strength of pre-trained latent diffusion models. We exploit the temporal dynamics in LR videos to guide the diffusion process by optimizing the latent sampling path with a motion-guided loss. The proposed motion-guided latent diffusion based VSR algorithm achieves significantly better perceptual quality than state-of-the-arts on real-world VSR benchmark datasets.
arXiv Detail & Related papers (2023-12-01T14:40:07Z)
FLAIR: A Conditional Diffusion Framework with Applications to Face Video Restoration [14.17192434286707]
We present a new conditional diffusion framework called FLAIR for face video restoration. FLAIR ensures temporal consistency across frames in a computationally efficient fashion. Our experiments show superiority of FLAIR over the current state-of-the-art (SOTA) for video super-resolution, deblurring, JPEG restoration, and space-time frame on two high-quality face video datasets.
arXiv Detail & Related papers (2023-11-26T22:09:18Z)
RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images. Existing methods invert video frames individually often leading to undesired inconsistent results over time. We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID) Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z)
Transform-Equivariant Consistency Learning for Temporal Sentence Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video. Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted. In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z)
Learned Video Compression via Heterogeneous Deformable Compensation Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance. More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets. Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z)
STIP: A SpatioTemporal Information-Preserving and Perception-Augmented Model for High-Resolution Video Prediction [78.129039340528]
We propose a Stemporal Information-Preserving and Perception-Augmented Model (STIP) to solve the above two problems. The proposed model aims to preserve thetemporal information for videos during the feature extraction and the state transitions. Experimental results show that the proposed STIP can predict videos with more satisfactory visual quality compared with a variety of state-of-the-art methods.
arXiv Detail & Related papers (2022-06-09T09:49:04Z)
Investigating Tradeoffs in Real-World Video Super-Resolution [90.81396836308085]
Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability. To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance. To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
arXiv Detail & Related papers (2021-11-24T18:58:21Z)
Intrinsic Temporal Regularization for High-resolution Human Video Synthesis [59.54483950973432]
temporal consistency is crucial for extending image processing pipelines to the video domain. We propose an effective intrinsic temporal regularization scheme, where an intrinsic confidence map is estimated via the frame generator to regulate motion estimation. We apply our intrinsic temporal regulation to single-image generator, leading to a powerful " INTERnet" capable of generating $512times512$ resolution human action videos.
arXiv Detail & Related papers (2020-12-11T05:29:45Z)
Coherent Loss: A Generic Framework for Stable Video Segmentation [103.78087255807482]
We investigate how a jittering artifact degrades the visual quality of video segmentation results. We propose a Coherent Loss with a generic framework to enhance the performance of a neural network against jittering artifacts.
arXiv Detail & Related papers (2020-10-25T10:48:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.