Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution
- URL: http://arxiv.org/abs/2505.02159v1
- Date: Sun, 04 May 2025 15:46:34 GMT
- Title: Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution
- Authors: Xingyu Zhou, Wei Long, Jingbo Lu, Shiyin Jiang, Weiyi You, Haifeng Wu, Shuhang Gu,
- Abstract summary: Video super-resolution (VSR) can achieve better performance compared to single image super-resolution by leveraging temporal information.<n>We propose LRTI-VSR, a novel training framework for recurrent VSR that efficiently leverages Long-Range Refocused Temporal Information.<n>Our framework includes a generic training strategy that utilizes temporal propagation features from long video clips while training on shorter video clips.
- Score: 20.07870850150666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video super-resolution (VSR) can achieve better performance compared to single image super-resolution by additionally leveraging temporal information. In particular, the recurrent-based VSR model exploits long-range temporal information during inference and achieves superior detail restoration. However, effectively learning these long-term dependencies within long videos remains a key challenge. To address this, we propose LRTI-VSR, a novel training framework for recurrent VSR that efficiently leverages Long-Range Refocused Temporal Information. Our framework includes a generic training strategy that utilizes temporal propagation features from long video clips while training on shorter video clips. Additionally, we introduce a refocused intra&inter-frame transformer block which allows the VSR model to selectively prioritize useful temporal information through its attention module while further improving inter-frame information utilization in the FFN module. We evaluate LRTI-VSR on both CNN and transformer-based VSR architectures, conducting extensive ablation studies to validate the contribution of each component. Experiments on long-video test sets demonstrate that LRTI-VSR achieves state-of-the-art performance while maintaining training and computational efficiency.
Related papers
- FCA2: Frame Compression-Aware Autoencoder for Modular and Fast Compressed Video Super-Resolution [68.77813885751308]
State-of-the-art (SOTA) compressed video super-resolution (CVSR) models face persistent challenges, including prolonged inference time, complex training pipelines, and reliance on auxiliary information.<n>We propose an efficient and scalable solution inspired by the structural and statistical similarities between hyperspectral images (HSI) and video data.<n>Our approach introduces a compression-driven dimensionality reduction strategy that reduces computational complexity, accelerates inference, and enhances the extraction of temporal information across frames.
arXiv Detail & Related papers (2025-06-13T07:59:52Z) - ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning [68.76048244253582]
We introduce ViaRL, the first framework to leverage rule-based reinforcement learning (RL) for optimizing frame selection in video understanding.<n>ViaRL utilizes the answer accuracy of a downstream model as a reward signal to train a frame selector through trial-and-error.<n>ViaRL consistently delivers superior temporal grounding performance and robust generalization across diverse video understanding tasks.
arXiv Detail & Related papers (2025-05-21T12:29:40Z) - Cascaded Temporal Updating Network for Efficient Video Super-Resolution [47.63267159007611]
Key components in recurrent-based VSR networks significantly impact model efficiency.
We propose a cascaded temporal updating network (CTUN) for efficient VSR.
CTUN achieves a favorable trade-off between efficiency and performance compared to existing methods.
arXiv Detail & Related papers (2024-08-26T12:59:32Z) - Time-series Initialization and Conditioning for Video-agnostic Stabilization of Video Super-Resolution using Recurrent Networks [13.894981567082997]
A Recurrent Neural Network (RNN) for Video Super Resolution (VSR) is generally trained with randomly clipped and cropped short videos.
Since this RNN is optimized to super-resolve short videos, VSR of long videos is degraded due to the domain gap.
This paper proposes a training strategy of RNN for VSR that can work efficiently and stably independently of the video length and dynamics.
arXiv Detail & Related papers (2024-03-23T13:16:07Z) - Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention [46.74923772450212]
Vision Transformer has achieved great success in recovering missing details in low-resolution sequences.
Despite its superiority in VSR accuracy, the heavy computational burden and the large memory footprint hinder the deployment of Transformer-based VSR models.
We propose a novel feature-level masked processing framework: VSR with Masked Intra and inter frame Attention (MIA-VSR)
arXiv Detail & Related papers (2024-01-12T00:49:49Z) - Temporal Consistency Learning of inter-frames for Video Super-Resolution [38.26035126565062]
Video super-resolution (VSR) is a task that aims to reconstruct high-resolution (HR) frames from the low-resolution (LR) reference frame and multiple neighboring frames.
Existing methods generally explore information propagation and frame alignment to improve the performance of VSR.
We propose a Temporal Consistency learning Network (TCNet) for VSR in an end-to-end manner, to enhance the consistency of the reconstructed videos.
arXiv Detail & Related papers (2022-11-03T08:23:57Z) - Sliding Window Recurrent Network for Efficient Video Super-Resolution [0.0]
Video super-resolution (VSR) is the task of restoring high-resolution frames from a sequence of low-resolution inputs.
We propose a textitSliding Window based Recurrent Network (SWRN) which can be real-time inference while still achieving superior performance.
Our experiment on REDS dataset shows that the proposed method can be well adapted to mobile devices and produce visually pleasant results.
arXiv Detail & Related papers (2022-08-24T15:23:44Z) - Self-Supervised Adaptation for Video Super-Resolution [7.26562478548988]
Single-image super-resolution (SISR) networks can adapt their network parameters to specific input images.
We present a new learning algorithm that allows conventional video super-resolution (VSR) networks to adapt their parameters to test video frames.
arXiv Detail & Related papers (2021-03-18T08:30:24Z) - BasicVSR: The Search for Essential Components in Video Super-Resolution
and Beyond [75.62146968824682]
Video super-resolution (VSR) approaches tend to have more components than the image counterparts.
We show a succinct pipeline, BasicVSR, that achieves appealing improvements in terms of speed and restoration quality.
arXiv Detail & Related papers (2020-12-03T18:56:14Z) - Temporal Context Aggregation for Video Retrieval with Contrastive
Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features.
The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z) - Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video
Super-Resolution [95.26202278535543]
A simple solution is to split it into two sub-tasks: video frame (VFI) and video super-resolution (VSR)
temporalsynthesis and spatial super-resolution are intra-related in this task.
We propose a one-stage space-time video super-resolution framework, which directly synthesizes an HR slow-motion video from an LFR, LR video.
arXiv Detail & Related papers (2020-02-26T16:59:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.