Related papers: Camera Movement Classification in Historical Footage: A Comparative Study of Deep Video Models

Camera Movement Classification in Historical Footage: A Comparative Study of Deep Video Models

URL: http://arxiv.org/abs/2510.14713v1
Date: Thu, 16 Oct 2025 14:11:52 GMT
Title: Camera Movement Classification in Historical Footage: A Comparative Study of Deep Video Models
Authors: Tingyu Lin, Armin Dadras, Florian Kleber, Robert Sablatnig,
Abstract summary: This paper presents the first systematic evaluation of deep video CMC models on archival film material.<n>Five standard video classification models are assessed on the HISTORIAN dataset, which includes expert-annotated World War II footage.<n>The best-performing model, Video Swin Transformer, achieves 80.25% accuracy, showing strong convergence despite limited training data.
Score: 1.2875548392688383
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Camera movement conveys spatial and narrative information essential for understanding video content. While recent camera movement classification (CMC) methods perform well on modern datasets, their generalization to historical footage remains unexplored. This paper presents the first systematic evaluation of deep video CMC models on archival film material. We summarize representative methods and datasets, highlighting differences in model design and label definitions. Five standard video classification models are assessed on the HISTORIAN dataset, which includes expert-annotated World War II footage. The best-performing model, Video Swin Transformer, achieves 80.25% accuracy, showing strong convergence despite limited training data. Our findings highlight the challenges and potential of adapting existing models to low-quality video and motivate future work combining diverse input modalities and temporal architectures.

Related papers

Generative Video Matting [57.186684844156595]
Video matting has traditionally been limited by the lack of high-quality ground-truth data.<n>Most existing video matting datasets provide only human-annotated imperfect alpha and foreground annotations.<n>We introduce a novel video matting approach that can effectively leverage the rich priors from pre-trained video diffusion models.
arXiv Detail & Related papers (2025-08-11T12:18:55Z)
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning [78.44705665291741]
We present a comprehensive evaluation of modern video self-supervised models.<n>We focus on generalization across four key downstream factors: domain shift, sample efficiency, action granularity, and task diversity.<n>Our analysis shows that, despite architectural advances, transformer-based models remain sensitive to downstream conditions.
arXiv Detail & Related papers (2025-04-08T06:00:28Z)
Video Decomposition Prior: A Methodology to Decompose Videos into Layers [74.36790196133505]
This paper introduces a novel video decomposition prior VDP' framework which derives inspiration from professional video editing practices.<n>VDP framework decomposes a video sequence into a set of multiple RGB layers and associated opacity levels.<n>We address tasks such as video object segmentation, dehazing, and relighting.
arXiv Detail & Related papers (2024-12-06T10:35:45Z)
Video Diffusion Models: A Survey [3.7985353171858045]
Diffusion generative models have recently become a powerful technique for creating and modifying high-quality, coherent video content. This survey provides an overview of the critical components of diffusion models for video generation, including their applications, architectural design, and temporal dynamics modeling.
arXiv Detail & Related papers (2024-05-06T04:01:42Z)
Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video Recognition [62.85802939587308]
This paper focuses on exploring Class Incremental Audio-Visual Video Recognition (CIAVVR) Since both stored data and learned model of past classes contain historical knowledge, the core challenge is how to capture past data knowledge and past model knowledge to prevent catastrophic forgetting. We introduce Hierarchical Augmentation and Distillation (HAD), which comprises the Hierarchical Augmentation Module (HAM) and Hierarchical Distillation Module (HDM) to efficiently utilize the hierarchical structure of data and models.
arXiv Detail & Related papers (2024-01-11T23:00:24Z)
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets [36.95521842177614]
We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation. We identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning.
arXiv Detail & Related papers (2023-11-25T22:28:38Z)
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [70.19437817951673]
We argue that it is hard to judge the large conditional generative models from the simple metrics since these models are often trained on very large datasets with multi-aspect abilities. Our approach involves generating a diverse and comprehensive list of 700 prompts for text-to-video generation. Then, we evaluate the state-of-the-art video generative models on our carefully designed benchmark, in terms of visual qualities, content qualities, motion qualities, and text-video alignment with 17 well-selected objective metrics.
arXiv Detail & Related papers (2023-10-17T17:50:46Z)
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research. In this study, we focus on transferring knowledge for video classification tasks. We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z)
Video Summarization Based on Video-text Modelling [0.0]
We propose a multimodal self-supervised learning framework to obtain semantic representations of videos. We also introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries. An objective evaluation framework is proposed to measure the quality of video summaries based on video classification.
arXiv Detail & Related papers (2022-01-07T15:21:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.