Trajectory-aware Shifted State Space Models for Online Video Super-Resolution
- URL: http://arxiv.org/abs/2508.10453v1
- Date: Thu, 14 Aug 2025 08:42:15 GMT
- Title: Trajectory-aware Shifted State Space Models for Online Video Super-Resolution
- Authors: Qiang Zhu, Xiandong Meng, Yuxian Jiang, Fan Zhang, David Bull, Shuyuan Zhu, Bing Zeng,
- Abstract summary: This paper presents a novel online VSR method based on Trajectory-aware Shifted SSMs (TS-Mamba)<n>TS-Mamba first constructs the trajectories within a video to select the most similar tokens from the previous frames.<n>Our TS-Mamba achieves state-of-the-art performance in most cases and over 22.7% reduction complexity (in MACs)
- Score: 57.87099307245989
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online video super-resolution (VSR) is an important technique for many real-world video processing applications, which aims to restore the current high-resolution video frame based on temporally previous frames. Most of the existing online VSR methods solely employ one neighboring previous frame to achieve temporal alignment, which limits long-range temporal modeling of videos. Recently, state space models (SSMs) have been proposed with linear computational complexity and a global receptive field, which significantly improve computational efficiency and performance. In this context, this paper presents a novel online VSR method based on Trajectory-aware Shifted SSMs (TS-Mamba), leveraging both long-term trajectory modeling and low-complexity Mamba to achieve efficient spatio-temporal information aggregation. Specifically, TS-Mamba first constructs the trajectories within a video to select the most similar tokens from the previous frames. Then, a Trajectory-aware Shifted Mamba Aggregation (TSMA) module consisting of proposed shifted SSMs blocks is employed to aggregate the selected tokens. The shifted SSMs blocks are designed based on Hilbert scannings and corresponding shift operations to compensate for scanning losses and strengthen the spatial continuity of Mamba. Additionally, we propose a trajectory-aware loss function to supervise the trajectory generation, ensuring the accuracy of token selection when training our model. Extensive experiments on three widely used VSR test datasets demonstrate that compared with six online VSR benchmark models, our TS-Mamba achieves state-of-the-art performance in most cases and over 22.7\% complexity reduction (in MACs). The source code for TS-Mamba will be available at https://github.com.
Related papers
- Gather-Scatter Mamba: Accelerating Propagation with Efficient State Space Model [15.551773379039675]
State Space Models (SSMs) have historically played a central role in sequential modeling.<n>Recent advances in selective SSMs like Mamba offer a compelling alternative.<n>We propose a hybrid architecture that combines shifted window self-attention for spatial context aggregation with Mamba-based selective scanning for efficient temporal propagation.
arXiv Detail & Related papers (2025-10-01T13:11:13Z) - VSRM: A Robust Mamba-Based Framework for Video Super-Resolution [1.8506868409351092]
Video super-resolution remains a major challenge in low-level vision tasks.<n>In this work, we propose VSRM, a novel framework for processing long sequences in video.<n> VSRM achieves state-of-the-art results on diverse benchmarks, establishing itself as a solid foundation for future research.
arXiv Detail & Related papers (2025-06-28T05:51:42Z) - MambaVSR: Content-Aware Scanning State Space Model for Video Super-Resolution [33.457410717030946]
We propose MambaVSR, the first state-space model framework for super-resolution video.<n>MambaVSR enables dynamic interactions through the Shared Compass Construction ( SCC) and the Content-Aware Sequentialization (CAS)<n>Building upon, the CAS module effectively aligns and aggregates non-local similar content across multiple frames by interleaving temporal features along the learned spatial order.
arXiv Detail & Related papers (2025-06-13T13:22:28Z) - MLVTG: Mamba-Based Feature Alignment and LLM-Driven Purification for Multi-Modal Video Temporal Grounding [13.025856914576673]
Video Temporal Grounding aims to localize video clips corresponding to natural language queries.<n>Existing Transformer-based methods often suffer from redundant attention and suboptimal multi-modal alignment.<n>We propose MLVTG, a novel framework that integrates two key modules: MambaAligner and LLMRefiner.
arXiv Detail & Related papers (2025-06-10T07:20:12Z) - STNMamba: Mamba-based Spatial-Temporal Normality Learning for Video Anomaly Detection [48.997518615379995]
Video anomaly detection (VAD) has been extensively researched due to its potential for intelligent video systems.<n>Most existing methods based on CNNs and transformers still suffer from substantial computational burdens.<n>We propose a lightweight and effective Mamba-based network named STNMamba to enhance the learning of spatial-temporal normality.
arXiv Detail & Related papers (2024-12-28T08:49:23Z) - SIGMA: Selective Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.<n>We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.<n>Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z) - Learning Trajectory-Aware Transformer for Video Super-Resolution [50.49396123016185]
Video super-resolution aims to restore a sequence of high-resolution (HR) frames from their low-resolution (LR) counterparts.
Existing approaches usually align and aggregate video frames from limited adjacent frames.
We propose a novel Transformer for Video Super-Resolution (TTVSR)
arXiv Detail & Related papers (2022-04-08T03:37:39Z) - Optical-Flow-Reuse-Based Bidirectional Recurrent Network for Space-Time
Video Super-Resolution [52.899234731501075]
Space-time video super-resolution (ST-VSR) simultaneously increases the spatial resolution and frame rate for a given video.
Existing methods typically suffer from difficulties in how to efficiently leverage information from a large range of neighboring frames.
We propose a coarse-to-fine bidirectional recurrent neural network instead of using ConvLSTM to leverage knowledge between adjacent frames.
arXiv Detail & Related papers (2021-10-13T15:21:30Z) - Temporal Modulation Network for Controllable Space-Time Video
Super-Resolution [66.06549492893947]
Space-time video super-resolution aims to increase the spatial and temporal resolutions of low-resolution and low-frame-rate videos.
Deformable convolution based methods have achieved promising STVSR performance, but they could only infer the intermediate frame pre-defined in the training stage.
We propose a Temporal Modulation Network (TMNet) to interpolate arbitrary intermediate frame(s) with accurate high-resolution reconstruction.
arXiv Detail & Related papers (2021-04-21T17:10:53Z) - Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video
Super-Resolution [100.11355888909102]
Space-time video super-resolution aims at generating a high-resolution (HR) slow-motion video from a low-resolution (LR) and low frame rate (LFR) video sequence.
We present a one-stage space-time video super-resolution framework, which can directly reconstruct an HR slow-motion video sequence from an input LR and LFR video.
arXiv Detail & Related papers (2021-04-15T17:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.