Related papers: Harnessing Meta-Learning for Controllable Full-Frame Video Stabilization

Harnessing Meta-Learning for Controllable Full-Frame Video Stabilization

URL: http://arxiv.org/abs/2508.18859v1
Date: Tue, 26 Aug 2025 09:38:29 GMT
Title: Harnessing Meta-Learning for Controllable Full-Frame Video Stabilization
Authors: Muhammad Kashif Ali, Eun Woo Im, Dongjin Kim, Tae Hyun Kim, Vivek Gupta, Haonan Luo, Tianrui Li,
Abstract summary: We present a novel method that improves pixel-level synthesis video stabilization methods by rapidly adapting models to each input video at test time.<n>The proposed approach takes advantage of low-level visual cues available during inference to improve both the stability and visual quality of the output.
Score: 27.960157360933636
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Video stabilization remains a fundamental problem in computer vision, particularly pixel-level synthesis solutions for video stabilization, which synthesize full-frame outputs, add to the complexity of this task. These methods aim to enhance stability while synthesizing full-frame videos, but the inherent diversity in motion profiles and visual content present in each video sequence makes robust generalization with fixed parameters difficult. To address this, we present a novel method that improves pixel-level synthesis video stabilization methods by rapidly adapting models to each input video at test time. The proposed approach takes advantage of low-level visual cues available during inference to improve both the stability and visual quality of the output. Notably, the proposed rapid adaptation achieves significant performance gains even with a single adaptation pass. We further propose a jerk localization module and a targeted adaptation strategy, which focuses the adaptation on high-jerk segments for maximizing stability with fewer adaptation steps. The proposed methodology enables modern stabilizers to overcome the longstanding SOTA approaches while maintaining the full frame nature of the modern methods, while offering users with control mechanisms akin to classical approaches. Extensive experiments on diverse real-world datasets demonstrate the versatility of the proposed method. Our approach consistently improves the performance of various full-frame synthesis models in both qualitative and quantitative terms, including results on downstream applications.

Related papers

Zero-Shot Video Translation and Editing with Frame Spatial-Temporal Correspondence [81.82643953694485]
We present FRESCO, which integrates intra-frame correspondence with inter-frame correspondence to formulate a more robust spatial-temporal constraint.<n>Our method goes beyond attention guidance to explicitly optimize features, achieving high spatial-temporal consistency with the input video.<n>We verify FRESCO adaptations on two zero-shot tasks of video-to-video translation and text-guided video editing.
arXiv Detail & Related papers (2025-12-03T15:51:11Z)
POSE: Phased One-Step Adversarial Equilibrium for Video Diffusion Models [18.761042377485367]
POSE (Phased One-Step Equilibrium) is a distillation framework that reduces the sampling steps of large-scale video diffusion models.<n>We show that POSE outperforms other acceleration methods on VBench-I2V by average 7.15% in semantic alignment, temporal conference and frame quality.
arXiv Detail & Related papers (2025-08-28T17:20:01Z)
FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis [9.900921417459324]
We propose FreePCA, a training-free long video generation paradigm based on Principal Component Analysis (PCA)<n>We decouple consistent appearance and motion intensity features by measuring cosine similarity in the principal component space.<n>Experiments demonstrate that FreePCA can be applied to various video diffusion models without requiring training, leading to substantial improvements.
arXiv Detail & Related papers (2025-05-02T10:27:58Z)
Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling [81.37449968164692]
We propose Synchronized Coupled Sampling (SynCoS), a novel inference framework that synchronizes denoising paths across the entire video.<n>Our approach combines two complementary sampling strategies, which ensure seamless local transitions and enforce global coherence.<n>Extensive experiments show that SynCoS significantly improves multi-event long video generation, achieving smoother transitions and superior long-range coherence.
arXiv Detail & Related papers (2025-03-11T16:43:45Z)
Low-Light Video Enhancement via Spatial-Temporal Consistent Decomposition [52.89441679581216]
Low-Light Video Enhancement (LLVE) seeks to restore dynamic or static scenes plagued by severe invisibility and noise.<n>We present an innovative video decomposition strategy that incorporates view-independent and view-dependent components.<n>Our framework consistently outperforms existing methods, establishing a new SOTA performance.
arXiv Detail & Related papers (2024-05-24T15:56:40Z)
Harnessing Meta-Learning for Improving Full-Frame Video Stabilization [8.208892438376388]
We introduce a novel approach to enhance the performance of pixel-level synthesis solutions for video stabilization by adapting these models to individual input video sequences. The proposed adaptation exploits low-level visual cues during test-time to improve both the stability and quality of resulting videos.
arXiv Detail & Related papers (2024-03-06T12:31:02Z)
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution [65.91317390645163]
Upscale-A-Video is a text-guided latent diffusion framework for video upscaling. It ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences. It also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation.
arXiv Detail & Related papers (2023-12-11T18:54:52Z)
Fast Full-frame Video Stabilization with Iterative Optimization [21.962533235492625]
We propose an iterative optimization-based learning approach using synthetic datasets for video stabilization. We develop a two-level (coarse-to-fine) stabilizing algorithm based on the probabilistic flow field. We take a divide-and-conquer approach and propose a novel multiframe fusion strategy to render full-frame stabilized views.
arXiv Detail & Related papers (2023-07-24T13:24:19Z)
Motion-Conditioned Diffusion Model for Controllable Video Synthesis [75.367816656045]
We introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes. We show that MCDiff achieves the state-the-art visual quality in stroke-guided controllable video synthesis.
arXiv Detail & Related papers (2023-04-27T17:59:32Z)
Neural Re-rendering for Full-frame Video Stabilization [144.9918806873405]
We present an algorithm for full-frame video stabilization by first estimating dense warp fields. Full-frame stabilized frames can then be synthesized by fusing warped contents from neighboring frames.
arXiv Detail & Related papers (2021-02-11T18:59:45Z)
Deep Motion Blind Video Stabilization [4.544151613454639]
This work aims to declutter this over-complicated formulation of video stabilization with the help of a novel dataset. We successfully learn motion blind full-frame video stabilization through employing strictly conventional generative techniques. Our method achieves $sim3times$ speed-up over the currently available fastest video stabilization methods.
arXiv Detail & Related papers (2020-11-19T07:26:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.