A Simple Baseline for Video Restoration with Grouped Spatial-temporal
Shift
- URL: http://arxiv.org/abs/2206.10810v2
- Date: Mon, 22 May 2023 09:56:01 GMT
- Title: A Simple Baseline for Video Restoration with Grouped Spatial-temporal
Shift
- Authors: Dasong Li, Xiaoyu Shi, Yi Zhang, Ka Chun Cheung, Simon See, Xiaogang
Wang, Hongwei Qin, Hongsheng Li
- Abstract summary: In this study, we propose a simple yet effective framework for video restoration.
Our approach is based on grouped spatial-temporal shift, which is a lightweight and straightforward technique.
Our framework outperforms the previous state-of-the-art method, while using less than a quarter of its computational cost.
- Score: 36.71578909392314
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Video restoration, which aims to restore clear frames from degraded videos,
has numerous important applications. The key to video restoration depends on
utilizing inter-frame information. However, existing deep learning methods
often rely on complicated network architectures, such as optical flow
estimation, deformable convolution, and cross-frame self-attention layers,
resulting in high computational costs. In this study, we propose a simple yet
effective framework for video restoration. Our approach is based on grouped
spatial-temporal shift, which is a lightweight and straightforward technique
that can implicitly capture inter-frame correspondences for multi-frame
aggregation. By introducing grouped spatial shift, we attain expansive
effective receptive fields. Combined with basic 2D convolution, this simple
framework can effectively aggregate inter-frame information. Extensive
experiments demonstrate that our framework outperforms the previous
state-of-the-art method, while using less than a quarter of its computational
cost, on both video deblurring and video denoising tasks. These results
indicate the potential for our approach to significantly reduce computational
overhead while maintaining high-quality results. Code is avaliable at
https://github.com/dasongli1/Shift-Net.
Related papers
- DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models [9.145545884814327]
This paper introduces a method for zero-shot video restoration using pre-trained image restoration diffusion models.
We show that our method achieves top performance in zero-shot video restoration.
Our technique works with any 2D restoration diffusion model, offering a versatile and powerful tool for video enhancement tasks without extensive retraining.
arXiv Detail & Related papers (2024-07-01T17:59:12Z) - A Simple Recipe for Contrastively Pre-training Video-First Encoders
Beyond 16 Frames [54.90226700939778]
We build on the common paradigm of transferring large-scale, image--text models to video via shallow temporal fusion.
We expose two limitations to the approach: (1) decreased spatial capabilities, likely due to poor video--language alignment in standard video datasets, and (2) higher memory consumption, bottlenecking the number of frames that can be processed.
arXiv Detail & Related papers (2023-12-12T16:10:19Z) - Dynamic Frame Interpolation in Wavelet Domain [57.25341639095404]
Video frame is an important low-level computation vision task, which can increase frame rate for more fluent visual experience.
Existing methods have achieved great success by employing advanced motion models and synthesis networks.
WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.
arXiv Detail & Related papers (2023-09-07T06:41:15Z) - ReBotNet: Fast Real-time Video Enhancement [59.08038313427057]
Most restoration networks are slow, have high computational bottleneck, and can't be used for real-time video enhancement.
In this work, we design an efficient and fast framework to perform real-time enhancement for practical use-cases like live video calls and video streams.
To evaluate our method, we emulate two new datasets that real-world video call and streaming scenarios, and show extensive results on multiple datasets where ReBotNet outperforms existing approaches with lower computations, reduced memory requirements, and faster inference time.
arXiv Detail & Related papers (2023-03-23T17:58:05Z) - Deep Unsupervised Key Frame Extraction for Efficient Video
Classification [63.25852915237032]
This work presents an unsupervised method to retrieve the key frames, which combines Convolutional Neural Network (CNN) and Temporal Segment Density Peaks Clustering (TSDPC)
The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically.
Furthermore, a Long Short-Term Memory network (LSTM) is added on the top of the CNN to further elevate the performance of classification.
arXiv Detail & Related papers (2022-11-12T20:45:35Z) - Sliding Window Recurrent Network for Efficient Video Super-Resolution [0.0]
Video super-resolution (VSR) is the task of restoring high-resolution frames from a sequence of low-resolution inputs.
We propose a textitSliding Window based Recurrent Network (SWRN) which can be real-time inference while still achieving superior performance.
Our experiment on REDS dataset shows that the proposed method can be well adapted to mobile devices and produce visually pleasant results.
arXiv Detail & Related papers (2022-08-24T15:23:44Z) - Efficient Spatio-Temporal Recurrent Neural Network for Video Deblurring [39.63844562890704]
Real-time deblurring still remains a challenging task due to the complexity of spatially and temporally varying blur itself.
We adopt residual dense blocks into RNN cells, so as to efficiently extract the spatial features of the current frame.
We contribute a novel dataset (BSD) to the community, by collecting paired/sharp video clips using a co-axis beam splitter acquisition system.
arXiv Detail & Related papers (2021-06-30T12:53:02Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.