Temporal Consistency Learning of inter-frames for Video Super-Resolution
- URL: http://arxiv.org/abs/2211.01639v1
- Date: Thu, 3 Nov 2022 08:23:57 GMT
- Title: Temporal Consistency Learning of inter-frames for Video Super-Resolution
- Authors: Meiqin Liu, Shuo Jin, Chao Yao, Chunyu Lin and Yao Zhao
- Abstract summary: Video super-resolution (VSR) is a task that aims to reconstruct high-resolution (HR) frames from the low-resolution (LR) reference frame and multiple neighboring frames.
Existing methods generally explore information propagation and frame alignment to improve the performance of VSR.
We propose a Temporal Consistency learning Network (TCNet) for VSR in an end-to-end manner, to enhance the consistency of the reconstructed videos.
- Score: 38.26035126565062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video super-resolution (VSR) is a task that aims to reconstruct
high-resolution (HR) frames from the low-resolution (LR) reference frame and
multiple neighboring frames. The vital operation is to utilize the relative
misaligned frames for the current frame reconstruction and preserve the
consistency of the results. Existing methods generally explore information
propagation and frame alignment to improve the performance of VSR. However, few
studies focus on the temporal consistency of inter-frames. In this paper, we
propose a Temporal Consistency learning Network (TCNet) for VSR in an
end-to-end manner, to enhance the consistency of the reconstructed videos. A
spatio-temporal stability module is designed to learn the self-alignment from
inter-frames. Especially, the correlative matching is employed to exploit the
spatial dependency from each frame to maintain structural stability. Moreover,
a self-attention mechanism is utilized to learn the temporal correspondence to
implement an adaptive warping operation for temporal consistency among
multi-frames. Besides, a hybrid recurrent architecture is designed to leverage
short-term and long-term information. We further present a progressive fusion
module to perform a multistage fusion of spatio-temporal features. And the
final reconstructed frames are refined by these fused features. Objective and
subjective results of various experiments demonstrate that TCNet has superior
performance on different benchmark datasets, compared to several
state-of-the-art methods.
Related papers
- Continuous Space-Time Video Super-Resolution Utilizing Long-Range
Temporal Information [48.20843501171717]
We propose a continuous ST-VSR (CSTVSR) method that can convert the given video to any frame rate and spatial resolution.
We show that the proposed algorithm has good flexibility and achieves better performance on various datasets.
arXiv Detail & Related papers (2023-02-26T08:02:39Z) - Enhancing Space-time Video Super-resolution via Spatial-temporal Feature
Interaction [9.456643513690633]
The aim of space-time video super-resolution (STVSR) is to increase both the frame rate and the spatial resolution of a video.
Recent approaches solve STVSR using end-to-end deep neural networks.
We propose a spatial-temporal feature interaction network to enhance STVSR by exploiting both spatial and temporal correlations.
arXiv Detail & Related papers (2022-07-18T22:10:57Z) - Recurrent Video Restoration Transformer with Guided Deformable Attention [116.1684355529431]
We propose RVRT, which processes local neighboring frames in parallel within a globally recurrent framework.
RVRT achieves state-of-the-art performance on benchmark datasets with balanced model size, testing memory and runtime.
arXiv Detail & Related papers (2022-06-05T10:36:09Z) - STDAN: Deformable Attention Network for Space-Time Video
Super-Resolution [39.18399652834573]
We propose a deformable attention network called STDAN for STVSR.
First, we devise a long-short term feature (LSTFI) module, which is capable of abundant content from more neighboring input frames.
Second, we put forward a spatial-temporal deformable feature aggregation (STDFA) module, in which spatial and temporal contexts are adaptively captured and aggregated.
arXiv Detail & Related papers (2022-03-14T03:40:35Z) - MuCAN: Multi-Correspondence Aggregation Network for Video
Super-Resolution [63.02785017714131]
Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame.
Inter- and intra-frames are the key sources for exploiting temporal and spatial information.
We build an effective multi-correspondence aggregation network (MuCAN) for VSR.
arXiv Detail & Related papers (2020-07-23T05:41:27Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z) - Video Face Super-Resolution with Motion-Adaptive Feedback Cell [90.73821618795512]
Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN)
In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way.
arXiv Detail & Related papers (2020-02-15T13:14:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.