Time-series Initialization and Conditioning for Video-agnostic Stabilization of Video Super-Resolution using Recurrent Networks
- URL: http://arxiv.org/abs/2403.15832v1
- Date: Sat, 23 Mar 2024 13:16:07 GMT
- Title: Time-series Initialization and Conditioning for Video-agnostic Stabilization of Video Super-Resolution using Recurrent Networks
- Authors: Hiroshi Mori, Norimichi Ukita,
- Abstract summary: A Recurrent Neural Network (RNN) for Video Super Resolution (VSR) is generally trained with randomly clipped and cropped short videos.
Since this RNN is optimized to super-resolve short videos, VSR of long videos is degraded due to the domain gap.
This paper proposes a training strategy of RNN for VSR that can work efficiently and stably independently of the video length and dynamics.
- Score: 13.894981567082997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A Recurrent Neural Network (RNN) for Video Super Resolution (VSR) is generally trained with randomly clipped and cropped short videos extracted from original training videos due to various challenges in learning RNNs. However, since this RNN is optimized to super-resolve short videos, VSR of long videos is degraded due to the domain gap. Our preliminary experiments reveal that such degradation changes depending on the video properties, such as the video length and dynamics. To avoid this degradation, this paper proposes the training strategy of RNN for VSR that can work efficiently and stably independently of the video length and dynamics. The proposed training strategy stabilizes VSR by training a VSR network with various RNN hidden states changed depending on the video properties. Since computing such a variety of hidden states is time-consuming, this computational cost is reduced by reusing the hidden states for efficient training. In addition, training stability is further improved with frame-number conditioning. Our experimental results demonstrate that the proposed method performed better than base methods in videos with various lengths and dynamics.
Related papers
- Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning [65.86184845073075]
Video-RTS is a new approach to improve video reasoning capability with drastically improved data efficiency.<n>We employ efficient pure-RL training with output-based rewards, requiring no additional annotations or extensive fine-tuning.<n>We validate our approach on multiple video reasoning benchmarks, showing that Video-RTS surpasses existing video reasoning models by an average of 2.4% in accuracy.
arXiv Detail & Related papers (2025-07-09T02:06:13Z) - Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution [20.07870850150666]
Video super-resolution (VSR) can achieve better performance compared to single image super-resolution by leveraging temporal information.<n>We propose LRTI-VSR, a novel training framework for recurrent VSR that efficiently leverages Long-Range Refocused Temporal Information.<n>Our framework includes a generic training strategy that utilizes temporal propagation features from long video clips while training on shorter video clips.
arXiv Detail & Related papers (2025-05-04T15:46:34Z) - Cascaded Temporal Updating Network for Efficient Video Super-Resolution [47.63267159007611]
Key components in recurrent-based VSR networks significantly impact model efficiency.
We propose a cascaded temporal updating network (CTUN) for efficient VSR.
CTUN achieves a favorable trade-off between efficiency and performance compared to existing methods.
arXiv Detail & Related papers (2024-08-26T12:59:32Z) - Structured Sparsity Learning for Efficient Video Super-Resolution [99.1632164448236]
We develop a structured pruning scheme called Structured Sparsity Learning (SSL) according to the properties of video super-resolution (VSR) models.
In SSL, we design pruning schemes for several key components in VSR models, including residual blocks, recurrent networks, and upsampling networks.
arXiv Detail & Related papers (2022-06-15T17:36:04Z) - Accelerating the Training of Video Super-Resolution [26.449738545078986]
We show that it is possible to gradually train video models from small to large spatial/temporal sizes in an easy-to-hard manner.
Our method is capable of largely speeding up training (up to $6.2times$ speedup in wall-clock training time) without performance drop for various VSR models.
arXiv Detail & Related papers (2022-05-10T17:55:24Z) - Learning Trajectory-Aware Transformer for Video Super-Resolution [50.49396123016185]
Video super-resolution aims to restore a sequence of high-resolution (HR) frames from their low-resolution (LR) counterparts.
Existing approaches usually align and aggregate video frames from limited adjacent frames.
We propose a novel Transformer for Video Super-Resolution (TTVSR)
arXiv Detail & Related papers (2022-04-08T03:37:39Z) - Stable Long-Term Recurrent Video Super-Resolution [0.45880283710344055]
We introduce a new framework of recurrent VSR networks that is both stable and competitive, based on Lipschitz stability theory.
We propose a new recurrent VSR network, coined Middle Recurrent Video Super-Resolution (MRVSR), based on this framework.
arXiv Detail & Related papers (2021-12-16T15:12:52Z) - Investigating Tradeoffs in Real-World Video Super-Resolution [90.81396836308085]
Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability.
To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance.
To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
arXiv Detail & Related papers (2021-11-24T18:58:21Z) - Self-Supervised Adaptation for Video Super-Resolution [7.26562478548988]
Single-image super-resolution (SISR) networks can adapt their network parameters to specific input images.
We present a new learning algorithm that allows conventional video super-resolution (VSR) networks to adapt their parameters to test video frames.
arXiv Detail & Related papers (2021-03-18T08:30:24Z) - A Deep-Unfolded Reference-Based RPCA Network For Video
Foreground-Background Separation [86.35434065681925]
This paper proposes a new deep-unfolding-based network design for the problem of Robust Principal Component Analysis (RPCA)
Unlike existing designs, our approach focuses on modeling the temporal correlation between the sparse representations of consecutive video frames.
Experimentation using the moving MNIST dataset shows that the proposed network outperforms a recently proposed state-of-the-art RPCA network in the task of video foreground-background separation.
arXiv Detail & Related papers (2020-10-02T11:40:09Z) - Revisiting Temporal Modeling for Video Super-resolution [47.90584361677039]
We study and compare three temporal modeling methods (2D CNN with early fusion, 3D CNN with slow fusion and Recurrent Neural Network) for video super-resolution.
We also propose a novel Recurrent Residual Network (RRN) for efficient video super-resolution, where residual learning is utilized to stabilize the training of RNN.
arXiv Detail & Related papers (2020-08-13T09:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.