VRT: A Video Restoration Transformer
- URL: http://arxiv.org/abs/2201.12288v1
- Date: Fri, 28 Jan 2022 17:54:43 GMT
- Title: VRT: A Video Restoration Transformer
- Authors: Jingyun Liang and Jiezhang Cao and Yuchen Fan and Kai Zhang and Rakesh
Ranjan and Yawei Li and Radu Timofte and Luc Van Gool
- Abstract summary: Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames.
We propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities.
- Score: 126.79589717404863
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Video restoration (e.g., video super-resolution) aims to restore high-quality
frames from low-quality frames. Different from single image restoration, video
restoration generally requires to utilize temporal information from multiple
adjacent but usually misaligned video frames. Existing deep methods generally
tackle with this by exploiting a sliding window strategy or a recurrent
architecture, which either is restricted by frame-by-frame restoration or lacks
long-range modelling ability. In this paper, we propose a Video Restoration
Transformer (VRT) with parallel frame prediction and long-range temporal
dependency modelling abilities. More specifically, VRT is composed of multiple
scales, each of which consists of two kinds of modules: temporal mutual self
attention (TMSA) and parallel warping. TMSA divides the video into small clips,
on which mutual attention is applied for joint motion estimation, feature
alignment and feature fusion, while self attention is used for feature
extraction. To enable cross-clip interactions, the video sequence is shifted
for every other layer. Besides, parallel warping is used to further fuse
information from neighboring frames by parallel feature warping. Experimental
results on three tasks, including video super-resolution, video deblurring and
video denoising, demonstrate that VRT outperforms the state-of-the-art methods
by large margins ($\textbf{up to 2.16dB}$) on nine benchmark datasets.
Related papers
- ViStripformer: A Token-Efficient Transformer for Versatile Video
Restoration [42.356013390749204]
ViStripformer is an effective and efficient transformer architecture with much lower memory usage than the vanilla transformer.
It decomposes video frames into strip-shaped features in horizontal and vertical directions for Intra-SA and Inter-SA to address degradation patterns with various orientations and magnitudes.
arXiv Detail & Related papers (2023-12-22T08:05:38Z) - Multi-entity Video Transformers for Fine-Grained Video Representation
Learning [36.31020249963468]
We re-examine the design of transformer architectures for video representation learning.
A salient aspect of our self-supervised method is the improved integration of spatial information in the temporal pipeline.
Our Multi-entity Video Transformer (MV-Former) architecture achieves state-of-the-art results on multiple fine-grained video benchmarks.
arXiv Detail & Related papers (2023-11-17T21:23:12Z) - Aggregating Long-term Sharp Features via Hybrid Transformers for Video
Deblurring [76.54162653678871]
We propose a video deblurring method that leverages both neighboring frames and present sharp frames using hybrid Transformers for feature aggregation.
Our proposed method outperforms state-of-the-art video deblurring methods as well as event-driven video deblurring methods in terms of quantitative metrics and visual quality.
arXiv Detail & Related papers (2023-09-13T16:12:11Z) - Task Agnostic Restoration of Natural Video Dynamics [10.078712109708592]
In many video restoration/translation tasks, image processing operations are na"ively extended to the video domain by processing each frame independently.
We propose a general framework for this task that learns to infer and utilize consistent motion dynamics from inconsistent videos to mitigate the temporal flicker.
The proposed framework produces SOTA results on two benchmark datasets, DAVIS and videvo.net, processed by numerous image processing applications.
arXiv Detail & Related papers (2022-06-08T09:00:31Z) - Recurrent Video Restoration Transformer with Guided Deformable Attention [116.1684355529431]
We propose RVRT, which processes local neighboring frames in parallel within a globally recurrent framework.
RVRT achieves state-of-the-art performance on benchmark datasets with balanced model size, testing memory and runtime.
arXiv Detail & Related papers (2022-06-05T10:36:09Z) - Learning Trajectory-Aware Transformer for Video Super-Resolution [50.49396123016185]
Video super-resolution aims to restore a sequence of high-resolution (HR) frames from their low-resolution (LR) counterparts.
Existing approaches usually align and aggregate video frames from limited adjacent frames.
We propose a novel Transformer for Video Super-Resolution (TTVSR)
arXiv Detail & Related papers (2022-04-08T03:37:39Z) - Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video
Super-Resolution [100.11355888909102]
Space-time video super-resolution aims at generating a high-resolution (HR) slow-motion video from a low-resolution (LR) and low frame rate (LFR) video sequence.
We present a one-stage space-time video super-resolution framework, which can directly reconstruct an HR slow-motion video sequence from an input LR and LFR video.
arXiv Detail & Related papers (2021-04-15T17:59:23Z) - Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video
Super-Resolution [95.26202278535543]
A simple solution is to split it into two sub-tasks: video frame (VFI) and video super-resolution (VSR)
temporalsynthesis and spatial super-resolution are intra-related in this task.
We propose a one-stage space-time video super-resolution framework, which directly synthesizes an HR slow-motion video from an LFR, LR video.
arXiv Detail & Related papers (2020-02-26T16:59:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.