BasicAVSR: Arbitrary-Scale Video Super-Resolution via Image Priors and Enhanced Motion Compensation
- URL: http://arxiv.org/abs/2510.26149v2
- Date: Thu, 06 Nov 2025 07:48:52 GMT
- Title: BasicAVSR: Arbitrary-Scale Video Super-Resolution via Image Priors and Enhanced Motion Compensation
- Authors: Wei Shang, Wanying Zhang, Shuhang Gu, Pengfei Zhu, Qinghua Hu, Dongwei Ren,
- Abstract summary: We propose a BasicAVSR for Arbitrary-scale video super-resolution (AVSR)<n>AVSR aims to enhance the resolution of video frames, potentially various scaling factors.<n>We show that BasicAVSR significantly outperforms existing methods in terms of super-resolution quality, generalization ability, and inference speed.
- Score: 70.27358326228399
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Arbitrary-scale video super-resolution (AVSR) aims to enhance the resolution of video frames, potentially at various scaling factors, which presents several challenges regarding spatial detail reproduction, temporal consistency, and computational complexity. In this paper, we propose a strong baseline BasicAVSR for AVSR by integrating four key components: 1) adaptive multi-scale frequency priors generated from image Laplacian pyramids, 2) a flow-guided propagation unit to aggregate spatiotemporal information from adjacent frames, 3) a second-order motion compensation unit for more accurate spatial alignment of adjacent frames, and 4) a hyper-upsampling unit to generate scale-aware and content-independent upsampling kernels. To meet diverse application demands, we instantiate three propagation variants: (i) a unidirectional RNN unit for strictly online inference, (ii) a unidirectional RNN unit empowered with a limited lookahead that tolerates a small output delay, and (iii) a bidirectional RNN unit designed for offline tasks where computational resources are less constrained. Experimental results demonstrate the effectiveness and adaptability of our model across these different scenarios. Through extensive experiments, we show that BasicAVSR significantly outperforms existing methods in terms of super-resolution quality, generalization ability, and inference speed. Our work not only advances the state-of-the-art in AVSR but also extends its core components to multiple frameworks for diverse scenarios. The code is available at https://github.com/shangwei5/BasicAVSR.
Related papers
- MVGSR: Multi-View Consistent 3D Gaussian Super-Resolution via Epipolar Guidance [13.050002358238793]
We introduce Multi-View Consistent 3D Gaussian Splatting Super-Resolution (MVGSR)<n>MVGSR focuses on integrating multi-view information for 3DGS rendering with high-frequency details and enhanced consistency.<n>Our method achieves state-of-the-art performance on both object-centric and scene-level 3DGS SR benchmarks.
arXiv Detail & Related papers (2025-12-17T03:23:12Z) - LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning [73.90466023069125]
We propose LOVE-R1, a model that can adaptively zoom in on a video clip.<n>The model is first provided with densely sampled frames but in a small resolution.<n>If some spatial details are needed, the model can zoom in on a clip of interest with a large frame resolution.
arXiv Detail & Related papers (2025-09-29T13:43:55Z) - FCA2: Frame Compression-Aware Autoencoder for Modular and Fast Compressed Video Super-Resolution [68.77813885751308]
State-of-the-art (SOTA) compressed video super-resolution (CVSR) models face persistent challenges, including prolonged inference time, complex training pipelines, and reliance on auxiliary information.<n>We propose an efficient and scalable solution inspired by the structural and statistical similarities between hyperspectral images (HSI) and video data.<n>Our approach introduces a compression-driven dimensionality reduction strategy that reduces computational complexity, accelerates inference, and enhances the extraction of temporal information across frames.
arXiv Detail & Related papers (2025-06-13T07:59:52Z) - DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer [56.98400572837792]
DiVE produces high-fidelity, temporally coherent, and cross-view consistent multi-view videos.<n>These innovations collectively achieve a 2.62x speedup with minimal quality degradation.
arXiv Detail & Related papers (2025-04-28T09:20:50Z) - RSRWKV: A Linear-Complexity 2D Attention Mechanism for Efficient Remote Sensing Vision Task [20.16344973940904]
High-resolution remote sensing analysis faces challenges due to scene complexity and scale diversity.<n>We propose RSRWKV, featuring a novel 2D-WKV scanning mechanism that bridges sequential processing and 2D spatial reasoning.
arXiv Detail & Related papers (2025-03-26T10:03:46Z) - Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors [80.92195378575671]
We describe a strong baseline for Arbitra-scale super-resolution (AVSR)
We then introduce ST-AVSR by equipping our baseline with a multi-scale structural and textural prior computed from the pre-trained VGG network.
Comprehensive experiments show that ST-AVSR significantly improves super-resolution quality, generalization ability, and inference speed over the state-of-theart.
arXiv Detail & Related papers (2024-07-13T15:27:39Z) - Group-based Bi-Directional Recurrent Wavelet Neural Networks for Video
Super-Resolution [4.9136996406481135]
Video super-resolution (VSR) aims to estimate a high-resolution (HR) frame from a low-resolution (LR) frames.
Key challenge for VSR lies in the effective exploitation of spatial correlation in an intra-frame and temporal dependency between consecutive frames.
arXiv Detail & Related papers (2021-06-14T06:36:13Z) - Large Motion Video Super-Resolution with Dual Subnet and Multi-Stage
Communicated Upsampling [18.09730129484432]
Video super-resolution (VSR) aims at restoring a video in low-resolution (LR) and improving it to higher-resolution (HR)
In this paper, we propose a novel deep neural network with Dual Subnet and Multi-stage Communicated Upsampling (DSMC) for super-resolution of videos with large motion.
arXiv Detail & Related papers (2021-03-22T11:52:12Z) - MuCAN: Multi-Correspondence Aggregation Network for Video
Super-Resolution [63.02785017714131]
Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame.
Inter- and intra-frames are the key sources for exploiting temporal and spatial information.
We build an effective multi-correspondence aggregation network (MuCAN) for VSR.
arXiv Detail & Related papers (2020-07-23T05:41:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.