DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models
- URL: http://arxiv.org/abs/2407.01519v3
- Date: Fri, 04 Oct 2024 14:37:13 GMT
- Title: DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models
- Authors: Chang-Han Yeh, Chin-Yang Lin, Zhixiang Wang, Chi-Wei Hsiao, Ting-Hsuan Chen, Hau-Shiang Shiu, Yu-Lun Liu,
- Abstract summary: This paper introduces a method for zero-shot video restoration using pre-trained image restoration diffusion models.
We show that our method achieves top performance in zero-shot video restoration.
Our technique works with any 2D restoration diffusion model, offering a versatile and powerful tool for video enhancement tasks without extensive retraining.
- Score: 9.145545884814327
- License:
- Abstract: This paper introduces a method for zero-shot video restoration using pre-trained image restoration diffusion models. Traditional video restoration methods often need retraining for different settings and struggle with limited generalization across various degradation types and datasets. Our approach uses a hierarchical token merging strategy for keyframes and local frames, combined with a hybrid correspondence mechanism that blends optical flow and feature-based nearest neighbor matching (latent merging). We show that our method not only achieves top performance in zero-shot video restoration but also significantly surpasses trained models in generalization across diverse datasets and extreme degradations (8$\times$ super-resolution and high-standard deviation video denoising). We present evidence through quantitative metrics and visual comparisons on various challenging datasets. Additionally, our technique works with any 2D restoration diffusion model, offering a versatile and powerful tool for video enhancement tasks without extensive retraining. This research leads to more efficient and widely applicable video restoration technologies, supporting advancements in fields that require high-quality video output. See our project page for video results and source code at https://jimmycv07.github.io/DiffIR2VR_web/.
Related papers
- TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration [13.49297560533422]
Our method can restore various types of video degradation with a single unified model.
Our method advances the video restoration task by providing a unified solution that enhances video quality across multiple applications.
arXiv Detail & Related papers (2025-01-04T12:15:37Z) - Reversing the Damage: A QP-Aware Transformer-Diffusion Approach for 8K Video Restoration under Codec Compression [0.0]
DiQP is a novel Transformer-Diffusion model for restoring 8K video quality degraded by compression.
Our architecture combines the power of Transformers to capture long-range dependencies with an enhanced windowed mechanism.
Our model outperforms state-of-the-art methods, particularly for high-resolution videos such as 4K and 8K.
arXiv Detail & Related papers (2024-12-12T03:49:22Z) - Video Decomposition Prior: A Methodology to Decompose Videos into Layers [74.36790196133505]
This paper introduces a novel video decomposition prior VDP' framework which derives inspiration from professional video editing practices.
VDP framework decomposes a video sequence into a set of multiple RGB layers and associated opacity levels.
We address tasks such as video object segmentation, dehazing, and relighting.
arXiv Detail & Related papers (2024-12-06T10:35:45Z) - Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors [54.8852848659663]
Buffer Anytime is a framework for estimation of depth and normal maps (which we call geometric buffers) from video.
We demonstrate high-quality video buffer estimation by leveraging single-image priors with temporal consistency constraints.
arXiv Detail & Related papers (2024-11-26T09:28:32Z) - Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency [36.939731355462264]
This study proposes a novel and efficient blind video face enhancement method.
It restores high-quality videos from their compressed low-quality versions with an effective de-flickering mechanism.
Experiments conducted on the VFHQ-Test dataset demonstrate that our method surpasses the current state-of-the-art blind face video restoration and de-flickering methods on both efficiency and effectiveness.
arXiv Detail & Related papers (2024-11-25T15:14:36Z) - ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation [81.90265212988844]
We propose a training-free video method for generative video models in a plug-and-play manner.
We transform a video model into a self-cascaded video diffusion model with the designed hidden state correction modules.
Our training-free method is even comparable to trained models supported by huge compute resources and large-scale datasets.
arXiv Detail & Related papers (2024-06-03T00:31:13Z) - VCISR: Blind Single Image Super-Resolution with Video Compression
Synthetic Data [18.877077302923713]
We present a video compression-based degradation model to synthesize low-resolution image data in the blind SISR task.
Our proposed image synthesizing method is widely applicable to existing image datasets.
By introducing video coding artifacts to SISR degradation models, neural networks can super-resolve images with the ability to restore video compression degradations.
arXiv Detail & Related papers (2023-11-02T05:24:19Z) - Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation [92.55296042611886]
We propose a framework called "Reuse and Diffuse" dubbed $textitVidRD$ to produce more frames following the frames already generated by an LDM.
We also propose a set of strategies for composing video-text data that involve diverse content from multiple existing datasets.
arXiv Detail & Related papers (2023-09-07T08:12:58Z) - A Simple Baseline for Video Restoration with Grouped Spatial-temporal
Shift [36.71578909392314]
In this study, we propose a simple yet effective framework for video restoration.
Our approach is based on grouped spatial-temporal shift, which is a lightweight and straightforward technique.
Our framework outperforms the previous state-of-the-art method, while using less than a quarter of its computational cost.
arXiv Detail & Related papers (2022-06-22T02:16:47Z) - Recurrent Video Restoration Transformer with Guided Deformable Attention [116.1684355529431]
We propose RVRT, which processes local neighboring frames in parallel within a globally recurrent framework.
RVRT achieves state-of-the-art performance on benchmark datasets with balanced model size, testing memory and runtime.
arXiv Detail & Related papers (2022-06-05T10:36:09Z) - VRT: A Video Restoration Transformer [126.79589717404863]
Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames.
We propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities.
arXiv Detail & Related papers (2022-01-28T17:54:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.