Instant Video Models: Universal Adapters for Stabilizing Image-Based Networks
- URL: http://arxiv.org/abs/2512.03014v1
- Date: Tue, 02 Dec 2025 18:41:10 GMT
- Title: Instant Video Models: Universal Adapters for Stabilizing Image-Based Networks
- Authors: Matthew Dutson, Nathan Labiosa, Yin Li, Mohit Gupta,
- Abstract summary: We introduce a general approach for adapting frame-based models for stable and robust inference on video.<n>We describe a class of stability adapters that can be inserted into virtually any architecture and a resource-efficient training process that can be performed with a frozen base network.
- Score: 13.63794577587008
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When applied sequentially to video, frame-based networks often exhibit temporal inconsistency - for example, outputs that flicker between frames. This problem is amplified when the network inputs contain time-varying corruptions. In this work, we introduce a general approach for adapting frame-based models for stable and robust inference on video. We describe a class of stability adapters that can be inserted into virtually any architecture and a resource-efficient training process that can be performed with a frozen base network. We introduce a unified conceptual framework for describing temporal stability and corruption robustness, centered on a proposed accuracy-stability-robustness loss. By analyzing the theoretical properties of this loss, we identify the conditions where it produces well-behaved stabilizer training. Our experiments validate our approach on several vision tasks including denoising (NAFNet), image enhancement (HDRNet), monocular depth (Depth Anything v2), and semantic segmentation (DeepLabv3+). Our method improves temporal stability and robustness against a range of image corruptions (including compression artifacts, noise, and adverse weather), while preserving or improving the quality of predictions.
Related papers
- Video Dynamics Prior: An Internal Learning Approach for Robust Video
Enhancements [83.5820690348833]
We present a framework for low-level vision tasks that does not require any external training data corpus.
Our approach learns neural modules by optimizing over a corrupted sequence, leveraging the weights of the coherence-temporal test and statistics internal statistics.
arXiv Detail & Related papers (2023-12-13T01:57:11Z) - Ambiguity in solving imaging inverse problems with deep learning based
operators [0.0]
Large convolutional neural networks have been widely used as tools for image deblurring.
Image deblurring is mathematically modeled as an ill-posed inverse problem and its solution is difficult to approximate when noise affects the data.
In this paper, we propose some strategies to improve stability without losing to much accuracy to deblur images with deep-learning based methods.
arXiv Detail & Related papers (2023-05-31T12:07:08Z) - GPU-accelerated SIFT-aided source identification of stabilized videos [63.084540168532065]
We exploit the parallelization capabilities of Graphics Processing Units (GPUs) in the framework of stabilised frames inversion.
We propose to exploit SIFT features.
to estimate the camera momentum and %to identify less stabilized temporal segments.
Experiments confirm the effectiveness of the proposed approach in reducing the required computational time and improving the source identification accuracy.
arXiv Detail & Related papers (2022-07-29T07:01:31Z) - AuxAdapt: Stable and Efficient Test-Time Adaptation for Temporally
Consistent Video Semantic Segmentation [81.87943324048756]
In video segmentation, generating temporally consistent results across frames is as important as achieving frame-wise accuracy.
Existing methods rely on optical flow regularization or fine-tuning with test data to attain temporal consistency.
This paper presents an efficient, intuitive, and unsupervised online adaptation method, AuxAdapt, for improving the temporal consistency of most neural network models.
arXiv Detail & Related papers (2021-10-24T07:07:41Z) - Improving robustness against common corruptions with frequency biased
models [112.65717928060195]
unseen image corruptions can cause a surprisingly large drop in performance.
Image corruption types have different characteristics in the frequency spectrum and would benefit from a targeted type of data augmentation.
We propose a new regularization scheme that minimizes the total variation (TV) of convolution feature-maps to increase high-frequency robustness.
arXiv Detail & Related papers (2021-03-30T10:44:50Z) - Neural Re-rendering for Full-frame Video Stabilization [144.9918806873405]
We present an algorithm for full-frame video stabilization by first estimating dense warp fields.
Full-frame stabilized frames can then be synthesized by fusing warped contents from neighboring frames.
arXiv Detail & Related papers (2021-02-11T18:59:45Z) - Intrinsic Temporal Regularization for High-resolution Human Video
Synthesis [59.54483950973432]
temporal consistency is crucial for extending image processing pipelines to the video domain.
We propose an effective intrinsic temporal regularization scheme, where an intrinsic confidence map is estimated via the frame generator to regulate motion estimation.
We apply our intrinsic temporal regulation to single-image generator, leading to a powerful " INTERnet" capable of generating $512times512$ resolution human action videos.
arXiv Detail & Related papers (2020-12-11T05:29:45Z) - A Backbone Replaceable Fine-tuning Framework for Stable Face Alignment [21.696696531924374]
We propose a Jitter loss function that leverages temporal information to suppress inaccurate as well as jittered landmarks.
The proposed framework achieves at least 40% improvement on stability evaluation metrics.
It can swiftly convert a landmark detector for facial images to a better-performing one for videos without retraining the entire model.
arXiv Detail & Related papers (2020-10-19T13:40:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.