You Only Align Once: Bidirectional Interaction for Spatial-Temporal
Video Super-Resolution
- URL: http://arxiv.org/abs/2207.06345v1
- Date: Wed, 13 Jul 2022 17:01:16 GMT
- Title: You Only Align Once: Bidirectional Interaction for Spatial-Temporal
Video Super-Resolution
- Authors: Mengshun Hu, Kui Jiang, Zhixiang Nie, Zheng Wang
- Abstract summary: We propose an efficient recurrent network with bidirectional interaction for ST-VSR.
It first performs backward inference from future to past, and then follows forward inference to super-resolve intermediate frames.
Our method outperforms state-of-the-art methods in efficiency, and reduces calculation cost by about 22%.
- Score: 14.624610700550754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spatial-Temporal Video Super-Resolution (ST-VSR) technology generates
high-quality videos with higher resolution and higher frame rates. Existing
advanced methods accomplish ST-VSR tasks through the association of Spatial and
Temporal video super-resolution (S-VSR and T-VSR). These methods require two
alignments and fusions in S-VSR and T-VSR, which is obviously redundant and
fails to sufficiently explore the information flow of consecutive spatial LR
frames. Although bidirectional learning (future-to-past and past-to-future) was
introduced to cover all input frames, the direct fusion of final predictions
fails to sufficiently exploit intrinsic correlations of bidirectional motion
learning and spatial information from all frames. We propose an effective yet
efficient recurrent network with bidirectional interaction for ST-VSR, where
only one alignment and fusion is needed. Specifically, it first performs
backward inference from future to past, and then follows forward inference to
super-resolve intermediate frames. The backward and forward inferences are
assigned to learn structures and details to simplify the learning task with
joint optimizations. Furthermore, a Hybrid Fusion Module (HFM) is designed to
aggregate and distill information to refine spatial information and reconstruct
high-quality video frames. Extensive experiments on two public datasets
demonstrate that our method outperforms state-of-the-art methods in efficiency,
and reduces calculation cost by about 22%.
Related papers
- Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors [80.92195378575671]
We describe a strong baseline for Arbitra-scale super-resolution (AVSR)
We then introduce ST-AVSR by equipping our baseline with a multi-scale structural and textural prior computed from the pre-trained VGG network.
Comprehensive experiments show that ST-AVSR significantly improves super-resolution quality, generalization ability, and inference speed over the state-of-theart.
arXiv Detail & Related papers (2024-07-13T15:27:39Z) - Continuous Space-Time Video Super-Resolution Utilizing Long-Range
Temporal Information [48.20843501171717]
We propose a continuous ST-VSR (CSTVSR) method that can convert the given video to any frame rate and spatial resolution.
We show that the proposed algorithm has good flexibility and achieves better performance on various datasets.
arXiv Detail & Related papers (2023-02-26T08:02:39Z) - Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video
Super-Resolution via Cycle-Projected Mutual Learning [48.68503274323906]
We propose a Cycle-projected Mutual learning network (CycMu-Net) for ST-VSR.
CycMu-Net makes full use of spatial-temporal correlations via the mutual learning between S-VSR and T-VSR.
Our method significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-05-11T04:30:47Z) - STDAN: Deformable Attention Network for Space-Time Video
Super-Resolution [39.18399652834573]
We propose a deformable attention network called STDAN for STVSR.
First, we devise a long-short term feature (LSTFI) module, which is capable of abundant content from more neighboring input frames.
Second, we put forward a spatial-temporal deformable feature aggregation (STDFA) module, in which spatial and temporal contexts are adaptively captured and aggregated.
arXiv Detail & Related papers (2022-03-14T03:40:35Z) - Fast Online Video Super-Resolution with Deformable Attention Pyramid [172.16491820970646]
Video super-resolution (VSR) has many applications that pose strict causal, real-time, and latency constraints, including video streaming and TV.
We propose a recurrent VSR architecture based on a deformable attention pyramid (DAP)
arXiv Detail & Related papers (2022-02-03T17:49:04Z) - Optical-Flow-Reuse-Based Bidirectional Recurrent Network for Space-Time
Video Super-Resolution [52.899234731501075]
Space-time video super-resolution (ST-VSR) simultaneously increases the spatial resolution and frame rate for a given video.
Existing methods typically suffer from difficulties in how to efficiently leverage information from a large range of neighboring frames.
We propose a coarse-to-fine bidirectional recurrent neural network instead of using ConvLSTM to leverage knowledge between adjacent frames.
arXiv Detail & Related papers (2021-10-13T15:21:30Z) - Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video
Super-Resolution [95.26202278535543]
A simple solution is to split it into two sub-tasks: video frame (VFI) and video super-resolution (VSR)
temporalsynthesis and spatial super-resolution are intra-related in this task.
We propose a one-stage space-time video super-resolution framework, which directly synthesizes an HR slow-motion video from an LFR, LR video.
arXiv Detail & Related papers (2020-02-26T16:59:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.