Related papers: No Labels, No Look-Ahead: Unsupervised Online Video Stabilization with Classical Priors

No Labels, No Look-Ahead: Unsupervised Online Video Stabilization with Classical Priors

URL: http://arxiv.org/abs/2602.23141v1
Date: Thu, 26 Feb 2026 16:04:36 GMT
Title: No Labels, No Look-Ahead: Unsupervised Online Video Stabilization with Classical Priors
Authors: Tao Liu, Gang Wan, Kan Ren, Shibo Wen,
Abstract summary: We propose a new unsupervised framework for online video stabilization.<n>Unlike methods based on deep learning that require paired stable and unstable datasets, our approach instantiates the classical stabilization pipeline with three stages.<n>This design addresses three longstanding challenges in end-to-end learning: limited data, poor controllability, and inefficiency on hardware with constrained resources.
Score: 13.656039162358086
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a new unsupervised framework for online video stabilization. Unlike methods based on deep learning that require paired stable and unstable datasets, our approach instantiates the classical stabilization pipeline with three stages and incorporates a multithreaded buffering mechanism. This design addresses three longstanding challenges in end-to-end learning: limited data, poor controllability, and inefficiency on hardware with constrained resources. Existing benchmarks focus mainly on handheld videos with a forward view in visible light, which restricts the applicability of stabilization to domains such as UAV nighttime remote sensing. To fill this gap, we introduce a new multimodal UAV aerial video dataset (UAV-Test). Experiments show that our method consistently outperforms state-of-the-art online stabilizers in both quantitative metrics and visual quality, while achieving performance comparable to offline methods.

Related papers

FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing [97.35186681023025]
We introduce FFP-300K, a new large-scale dataset of high-fidelity video pairs at 720p resolution and 81 frames in length.<n>We propose a novel framework designed for true guidance-free FFP that resolves the tension between maintaining first-frame appearance and preserving source video motion.
arXiv Detail & Related papers (2026-01-05T01:46:22Z)
DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer [56.98400572837792]
DiVE produces high-fidelity, temporally coherent, and cross-view consistent multi-view videos.<n>These innovations collectively achieve a 2.62x speedup with minimal quality degradation.
arXiv Detail & Related papers (2025-04-28T09:20:50Z)
S3MOT: Monocular 3D Object Tracking with Selective State Space Model [3.5047603107971397]
Multi-object tracking in 3D space is essential for advancing robotics and computer applications.<n>It remains a significant challenge in monocular setups due to the difficulty of mining 3D associations from 2D video streams.<n>We present three innovative techniques to enhance the fusion of heterogeneous cues for monocular 3D MOT.
arXiv Detail & Related papers (2025-04-25T04:45:35Z)
CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised Video Anomaly Detection [3.146076597280736]
Video anomaly detection (VAD) is a challenging problem in video surveillance where the frames of anomaly need to be localized in an untrimmed video. We first propose to utilize the ViT-encoded visual features from CLIP, in contrast with the conventional C3D or I3D features in the domain, to efficiently extract discriminative representations in the novel technique. Our proposed CLIP-TSA outperforms the existing state-of-the-art (SOTA) methods by a large margin on three commonly-used benchmark datasets in the VAD problem.
arXiv Detail & Related papers (2022-12-09T22:28:24Z)
Minimum Latency Deep Online Video Stabilization [77.68990069996939]
We present a novel camera path optimization framework for the task of online video stabilization. In this work, we adopt recent off-the-shelf high-quality deep motion models for motion estimation to recover the camera trajectory. Our approach significantly outperforms state-of-the-art online methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-12-05T07:37:32Z)
Fast Online Video Super-Resolution with Deformable Attention Pyramid [172.16491820970646]
Video super-resolution (VSR) has many applications that pose strict causal, real-time, and latency constraints, including video streaming and TV. We propose a recurrent VSR architecture based on a deformable attention pyramid (DAP)
arXiv Detail & Related papers (2022-02-03T17:49:04Z)
Self-Supervised Multi-Frame Monocular Scene Flow [61.588808225321735]
We introduce a multi-frame monocular scene flow network based on self-supervised learning. We observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.
arXiv Detail & Related papers (2021-05-05T17:49:55Z)
DUT: Learning Video Stabilization by Simply Watching Unstable Videos [86.88635774560017]
We propose a Deep Unsupervised Trajectory-based stabilization framework (DUT) DUT makes the first attempt to stabilize unstable videos by explicitly estimating and smoothing trajectories in an unsupervised deep learning manner. Experiment results on public benchmarks show that DUT outperforms representative state-of-the-art methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-11-30T06:48:20Z)
Diagnosing and Preventing Instabilities in Recurrent Video Processing [23.39527368516591]
We show that video stability models tend to fail catastrophically at inference time on long visualizations. We introduce a diagnostic tool which produces adversarial input sequences optimized to trigger instabilities. We then introduce Stable Rank Normalization of the Layers (SRNL), a new algorithm that enforces these constraints.
arXiv Detail & Related papers (2020-10-10T21:39:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.