Related papers: Optimizing video analytics inference pipelines: a case study

Optimizing video analytics inference pipelines: a case study

URL: http://arxiv.org/abs/2512.07009v1
Date: Sun, 07 Dec 2025 21:17:53 GMT
Title: Optimizing video analytics inference pipelines: a case study
Authors: Saeid Ghafouri, Yuming Ding, Katerine Diaz Chito, Jesús Martinez del Rincón, Niamh O'Connell, Hans Vandierendonck,
Abstract summary: This paper presents a comprehensive case study on optimizing a poultry welfare monitoring system.<n>We introduce a set of optimizations, including multi-level parallelization, optimizing code with substituting CPU code with GPU-accelerated code, vectorized clustering, and memory-efficient post-processing.
Score: 3.4152678224558333
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cost-effective and scalable video analytics are essential for precision livestock monitoring, where high-resolution footage and near-real-time monitoring needs from commercial farms generates substantial computational workloads. This paper presents a comprehensive case study on optimizing a poultry welfare monitoring system through system-level improvements across detection, tracking, clustering, and behavioral analysis modules. We introduce a set of optimizations, including multi-level parallelization, Optimizing code with substituting CPU code with GPU-accelerated code, vectorized clustering, and memory-efficient post-processing. Evaluated on real-world farm video footage, these changes deliver up to a 2x speedup across pipelines without compromising model accuracy. Our findings highlight practical strategies for building high-throughput, low-latency video inference systems that reduce infrastructure demands in agricultural and smart sensing deployments as well as other large-scale video analytics applications.

Related papers

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models [78.32948112203228]
Video understanding represents the most challenging frontier in computer vision.<n>Recent emergence of Video-Large Multitemporal Models has demonstrated remarkable capabilities in video understanding tasks.<n>Survey aims to provide researchers and practitioners with a unified framework for advancing Video-LMM capabilities.
arXiv Detail & Related papers (2025-10-06T17:10:44Z)
BiVM: Accurate Binarized Neural Network for Efficient Video Matting [56.000594826508504]
Deep neural networks for real-time video matting suffer significant computational limitations on edge devices.<n>We present BiVM, an accurate and resource-efficient Binarized neural network for Video Matting.<n>BiVM surpasses alternative binarized video matting networks, including state-of-the-art (SOTA) binarization methods, by a substantial margin.
arXiv Detail & Related papers (2025-07-06T16:32:37Z)
ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning [68.76048244253582]
We introduce ViaRL, the first framework to leverage rule-based reinforcement learning (RL) for optimizing frame selection in video understanding.<n>ViaRL utilizes the answer accuracy of a downstream model as a reward signal to train a frame selector through trial-and-error.<n>ViaRL consistently delivers superior temporal grounding performance and robust generalization across diverse video understanding tasks.
arXiv Detail & Related papers (2025-05-21T12:29:40Z)
$φ$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation [22.607133083903125]
In-time optimization scales computation to derive deliberate reasoning steps for effective performance.<n>We frame the decoding strategy as foresight sampling, leveraging simulated future steps to obtain globally optimal step estimation.<n>Experiments show $phi$-Decoding outperforms strong baselines in both performance and efficiency.
arXiv Detail & Related papers (2025-03-17T15:38:33Z)
Understanding Long Videos via LLM-Powered Entity Relation Graphs [51.13422967711056]
GraphVideoAgent is a framework that maps and monitors the evolving relationships between visual entities throughout the video sequence.<n>Our approach demonstrates remarkable effectiveness when tested against industry benchmarks.
arXiv Detail & Related papers (2025-01-27T10:57:24Z)
Deep Learning and Hybrid Approaches for Dynamic Scene Analysis, Object Detection and Motion Tracking [0.0]
This project aims to develop a robust video surveillance system, which can segment videos into smaller clips based on the detection of activities.<n>It uses CCTV footage, for example, to record only major events-like the appearance of a person or a thief-so that storage is optimized and digital searches are easier.
arXiv Detail & Related papers (2024-12-05T07:44:40Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
High-level Stream Processing: A Complementary Analysis of Fault Recovery [1.3398445165628463]
We focus on robust deployment setups inspired by requirements for near real-time analytics of a large cloud observability platform. The results indicate significant potential for improving fault recovery and performance. New abstractions for transparent configuration tuning are also needed for large-scale industry setups.
arXiv Detail & Related papers (2024-05-13T16:48:57Z)
Turbo: Opportunistic Enhancement for Edge Video Analytics [15.528497833853146]
We study the problem of opportunistic data enhancement using the non-deterministic and fragmented idle GPU resources. We propose a task-specific discrimination and enhancement module and a model-aware adversarial training mechanism. Our system boosts object detection accuracy by $7.3-11.3%$ without incurring any latency costs.
arXiv Detail & Related papers (2022-06-29T12:13:30Z)
Video Frame Interpolation Transformer [86.20646863821908]
We propose a Transformer-based video framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations. To avoid the high computational cost of global self-attention, we introduce the concept of local attention into video. In addition, we develop a multi-scale frame scheme to fully realize the potential of Transformers.
arXiv Detail & Related papers (2021-11-27T05:35:10Z)
A Reinforcement-Learning-Based Energy-Efficient Framework for Multi-Task Video Analytics Pipeline [16.72264118199915]
Video analytics pipelines are energy-intensive due to high data rates and reliance on complex inference algorithms. We propose an adaptive-resolution optimization framework to minimize the energy use of multi-task video analytics pipelines. Our framework has significantly surpassed all baseline methods of similar accuracy on the YouTube-VIS dataset.
arXiv Detail & Related papers (2021-04-09T15:44:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.