Related papers: PrAViC: Probabilistic Adaptation Framework for Real-Time Video Classification

PrAViC: Probabilistic Adaptation Framework for Real-Time Video Classification

URL: http://arxiv.org/abs/2406.11443v2
Date: Wed, 13 Aug 2025 09:09:27 GMT
Title: PrAViC: Probabilistic Adaptation Framework for Real-Time Video Classification
Authors: Magdalena Trędowicz, Marcin Mazur, Szymon Janusz, Arkadiusz Lewicki, Jacek Tabor, Łukasz Struski,
Abstract summary: PrAViC is a novel, unified, and theoretically-based adaptation framework for tackling the online classification problem in video data.<n> PrAViC is evaluated by comparing it with existing state-of-the-art offline and online models and datasets.
Score: 7.380324916960336
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Video processing is generally divided into two main categories: processing of the entire video, which typically yields optimal classification outcomes, and real-time processing, where the objective is to make a decision as promptly as possible. Although the models dedicated to the processing of entire videos are typically well-defined and clearly presented in the literature, this is not the case for online processing, where a~plethora of hand-devised methods exist. To address this issue, we present PrAViC, a novel, unified, and theoretically-based adaptation framework for tackling the online classification problem in video data. The initial phase of our study is to establish a mathematical background for the classification of sequential data, with the potential to make a decision at an early stage. This allows us to construct a natural function that encourages the model to return a result much faster. The subsequent phase is to present a straightforward and readily implementable method for adapting offline models to the online setting using recurrent operations. Finally, PrAViC is evaluated by comparing it with existing state-of-the-art offline and online models and datasets. This enables the network to significantly reduce the time required to reach classification decisions while maintaining, or even enhancing, accuracy.

Related papers

End-to-End Training for Autoregressive Video Diffusion via Self-Resampling [63.84672807009907]
Autoregressive video diffusion models hold promise for world simulation but are vulnerable to exposure bias arising from the train-test mismatch.<n>We introduce Resampling Forcing, a teacher-free framework that enables training autoregressive video models from scratch and at scale.
arXiv Detail & Related papers (2025-12-17T18:53:29Z)
MPRU: Modular Projection-Redistribution Unlearning as Output Filter for Classification Pipelines [23.370444162993707]
We propose an emphinductive approach to machine unlearning (MU)<n>Unlearning can be done by reversing the last training sequence. This is implemented by appending a projection-redistribution layer in the end of the model.<n>Experiment results show consistently similar output to a fully retrained model with a high computational cost reduction.
arXiv Detail & Related papers (2025-10-30T08:09:37Z)
Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [55.914891182214475]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z)
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning [78.44705665291741]
We present a comprehensive evaluation of modern video self-supervised models. We focus on generalization across four key downstream factors: domain shift, sample efficiency, action granularity, and task diversity. Our analysis shows that, despite architectural advances, transformer-based models remain sensitive to downstream conditions.
arXiv Detail & Related papers (2025-04-08T06:00:28Z)
Bayesian Test-Time Adaptation for Vision-Language Models [51.93247610195295]
Test-time adaptation with pre-trained vision-language models, such as CLIP, aims to adapt the model to new, potentially out-of-distribution test data. We propose a novel approach, textbfBayesian textbfClass textbfAdaptation (BCA), which in addition to continuously updating class embeddings to adapt likelihood, also uses the posterior of incoming samples to continuously update the prior for each class embedding.
arXiv Detail & Related papers (2025-03-12T10:42:11Z)
ODEStream: A Buffer-Free Online Learning Framework with ODE-based Adaptor for Streaming Time Series Forecasting [11.261457967759688]
ODEStream is a buffer-free continual learning framework that incorporates a temporal isolation layer to capture temporal dependencies within the data.<n>It generates a continuous data representation, enabling seamless adaptation to changing dynamics in a data streaming scenario.<n>Our approach focuses on learning how the dynamics and distribution of historical data change over time, facilitating direct processing of streaming sequences.
arXiv Detail & Related papers (2024-11-11T22:36:33Z)
Random Representations Outperform Online Continually Learned Representations [68.42776779425978]
We show that existing online continually trained deep networks produce inferior representations compared to a simple pre-defined random transforms. Our method, called RanDumb, significantly outperforms state-of-the-art continually learned representations across all online continual learning benchmarks. Our study reveals the significant limitations of representation learning, particularly in low-exemplar and online continual learning scenarios.
arXiv Detail & Related papers (2024-02-13T22:07:29Z)
Learning from One Continuous Video Stream [70.30084026960819]
We introduce a framework for online learning from a single continuous video stream. This poses great challenges given the high correlation between consecutive video frames. We employ pixel-to-pixel modelling as a practical and flexible way to switch between pre-training and single-stream evaluation.
arXiv Detail & Related papers (2023-12-01T14:03:30Z)
Adaptive Training Distributions with Scalable Online Bilevel Optimization [26.029033134519604]
Large neural networks pretrained on web-scale corpora are central to modern machine learning. This work considers modifying the pretraining distribution in the case where one has a small sample of data reflecting the targeted test conditions. We propose an algorithm motivated by a recent formulation of this setting as an online, bilevel optimization problem.
arXiv Detail & Related papers (2023-11-20T18:01:29Z)
TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement [64.11385310305612]
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS.
arXiv Detail & Related papers (2023-06-14T17:07:51Z)
SimOn: A Simple Framework for Online Temporal Action Localization [51.27476730635852]
We propose a framework, termed SimOn, that learns to predict action instances using the popular Transformer architecture. Experimental results on the THUMOS14 and ActivityNet1.3 datasets show that our model remarkably outperforms the previous methods.
arXiv Detail & Related papers (2022-11-08T04:50:54Z)
Direct Embedding of Temporal Network Edges via Time-Decayed Line Graphs [51.51417735550026]
Methods for machine learning on temporal networks generally exhibit at least one of two limitations. We present a simple method that avoids both shortcomings: construct the line graph of the network, which includes a node for each interaction, and weigh the edges of this graph based on the difference in time between interactions. Empirical results on real-world networks demonstrate our method's efficacy and efficiency on both edge classification and temporal link prediction.
arXiv Detail & Related papers (2022-09-30T18:24:13Z)
Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations. We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning. We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z)
AuxAdapt: Stable and Efficient Test-Time Adaptation for Temporally Consistent Video Semantic Segmentation [81.87943324048756]
In video segmentation, generating temporally consistent results across frames is as important as achieving frame-wise accuracy. Existing methods rely on optical flow regularization or fine-tuning with test data to attain temporal consistency. This paper presents an efficient, intuitive, and unsupervised online adaptation method, AuxAdapt, for improving the temporal consistency of most neural network models.
arXiv Detail & Related papers (2021-10-24T07:07:41Z)
Network Estimation by Mixing: Adaptivity and More [2.3478438171452014]
We propose a mixing strategy that leverages available arbitrary models to improve their individual performances. The proposed method is computationally efficient and almost tuning-free. We show that the proposed method performs equally well as the oracle estimate when the true model is included as individual candidates.
arXiv Detail & Related papers (2021-06-05T05:17:04Z)
Online Feature Screening for Data Streams with Concept Drift [8.807587076209566]
This research study focuses on classification datasets. Our experiments show proposed methods can generate the same feature importance as their offline version with faster speed and less storage consumption. The results show that online screening methods with integrated model adaptation have a higher true feature detection rate than without model adaptation on data streams with the concept drift property.
arXiv Detail & Related papers (2021-04-07T03:16:15Z)
Distilling Interpretable Models into Human-Readable Code [71.11328360614479]
Human-readability is an important and desirable standard for machine-learned model interpretability. We propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code. We describe a piecewise-linear curve-fitting algorithm that produces high-quality results efficiently and reliably across a broad range of use cases.
arXiv Detail & Related papers (2021-01-21T01:46:36Z)
A Flexible Selection Scheme for Minimum-Effort Transfer Learning [27.920304852537534]
Fine-tuning is a popular way of exploiting knowledge contained in a pre-trained convolutional network for a new visual recognition task. We introduce a new form of fine-tuning, called flex-tuning, in which any individual unit of a network can be tuned. We show that fine-tuning individual units, despite its simplicity, yields very good results as an adaptation technique.
arXiv Detail & Related papers (2020-08-27T08:57:30Z)
Fast Template Matching and Update for Video Object Tracking and Segmentation [56.465510428878]
The main task we aim to tackle is the multi-instance semi-supervised video object segmentation across a sequence of frames. The challenges lie in the selection of the matching method to predict the result as well as to decide whether to update the target template. We propose a novel approach which utilizes reinforcement learning to make these two decisions at the same time.
arXiv Detail & Related papers (2020-04-16T08:58:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.