Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention
- URL: http://arxiv.org/abs/2401.06312v4
- Date: Fri, 29 Mar 2024 13:10:56 GMT
- Title: Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention
- Authors: Xingyu Zhou, Leheng Zhang, Xiaorui Zhao, Keze Wang, Leida Li, Shuhang Gu,
- Abstract summary: Vision Transformer has achieved great success in recovering missing details in low-resolution sequences.
Despite its superiority in VSR accuracy, the heavy computational burden and the large memory footprint hinder the deployment of Transformer-based VSR models.
We propose a novel feature-level masked processing framework: VSR with Masked Intra and inter frame Attention (MIA-VSR)
- Score: 46.74923772450212
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Vision Transformer has achieved great success in recovering missing details in low-resolution sequences, i.e., the video super-resolution (VSR) task. Despite its superiority in VSR accuracy, the heavy computational burden as well as the large memory footprint hinder the deployment of Transformer-based VSR models on constrained devices. In this paper, we address the above issue by proposing a novel feature-level masked processing framework: VSR with Masked Intra and inter frame Attention (MIA-VSR). The core of MIA-VSR is leveraging feature-level temporal continuity between adjacent frames to reduce redundant computations and make more rational use of previously enhanced SR features. Concretely, we propose an intra-frame and inter-frame attention block which takes the respective roles of past features and input features into consideration and only exploits previously enhanced features to provide supplementary information. In addition, an adaptive block-wise mask prediction module is developed to skip unimportant computations according to feature similarity between adjacent frames. We conduct detailed ablation studies to validate our contributions and compare the proposed method with recent state-of-the-art VSR approaches. The experimental results demonstrate that MIA-VSR improves the memory and computation efficiency over state-of-the-art methods, without trading off PSNR accuracy. The code is available at https://github.com/LabShuHangGU/MIA-VSR.
Related papers
- Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors [80.92195378575671]
We describe a strong baseline for Arbitra-scale super-resolution (AVSR)
We then introduce ST-AVSR by equipping our baseline with a multi-scale structural and textural prior computed from the pre-trained VGG network.
Comprehensive experiments show that ST-AVSR significantly improves super-resolution quality, generalization ability, and inference speed over the state-of-theart.
arXiv Detail & Related papers (2024-07-13T15:27:39Z) - Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models [17.570136632211693]
We present StableVSR, a VSR method based on DMs that can enhance the perceptual quality of upscaled videos by synthesizing realistic and temporally-consistent details.
We demonstrate the effectiveness of StableVSR in enhancing the perceptual quality of upscaled videos while achieving better temporal consistency compared to existing state-of-the-art methods for VSR.
arXiv Detail & Related papers (2023-11-27T15:14:38Z) - Efficient Semantic Segmentation by Altering Resolutions for Compressed
Videos [42.944135041061166]
We propose an altering resolution framework called AR-Seg for compressed videos to achieve efficient video segmentation.
AR-Seg aims to reduce the computational cost by using low resolution for non-keyframes.
Experiments on CamVid and Cityscapes show that AR-Seg achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-03-13T15:58:15Z) - Sliding Window Recurrent Network for Efficient Video Super-Resolution [0.0]
Video super-resolution (VSR) is the task of restoring high-resolution frames from a sequence of low-resolution inputs.
We propose a textitSliding Window based Recurrent Network (SWRN) which can be real-time inference while still achieving superior performance.
Our experiment on REDS dataset shows that the proposed method can be well adapted to mobile devices and produce visually pleasant results.
arXiv Detail & Related papers (2022-08-24T15:23:44Z) - Boosting Video Super Resolution with Patch-Based Temporal Redundancy
Optimization [46.833568886576074]
We discuss the influence of the temporal redundancy in the patches with stationary objects and background.
We develop two simple yet effective plug and play methods to improve the performance of existing local and non-local propagation-based VSR algorithms.
arXiv Detail & Related papers (2022-07-18T15:11:18Z) - Fast Online Video Super-Resolution with Deformable Attention Pyramid [172.16491820970646]
Video super-resolution (VSR) has many applications that pose strict causal, real-time, and latency constraints, including video streaming and TV.
We propose a recurrent VSR architecture based on a deformable attention pyramid (DAP)
arXiv Detail & Related papers (2022-02-03T17:49:04Z) - BasicVSR: The Search for Essential Components in Video Super-Resolution
and Beyond [75.62146968824682]
Video super-resolution (VSR) approaches tend to have more components than the image counterparts.
We show a succinct pipeline, BasicVSR, that achieves appealing improvements in terms of speed and restoration quality.
arXiv Detail & Related papers (2020-12-03T18:56:14Z) - MuCAN: Multi-Correspondence Aggregation Network for Video
Super-Resolution [63.02785017714131]
Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame.
Inter- and intra-frames are the key sources for exploiting temporal and spatial information.
We build an effective multi-correspondence aggregation network (MuCAN) for VSR.
arXiv Detail & Related papers (2020-07-23T05:41:27Z) - Video Face Super-Resolution with Motion-Adaptive Feedback Cell [90.73821618795512]
Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN)
In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way.
arXiv Detail & Related papers (2020-02-15T13:14:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.