Related papers: ECCO: Leveraging Cross-Camera Correlations for Efficient Live Video Continuous Learning

ECCO: Leveraging Cross-Camera Correlations for Efficient Live Video Continuous Learning

URL: http://arxiv.org/abs/2512.11727v1
Date: Fri, 12 Dec 2025 17:07:59 GMT
Title: ECCO: Leveraging Cross-Camera Correlations for Efficient Live Video Continuous Learning
Authors: Yuze He, Ferdi Kossmann, Srinivasan Seshan, Peter Steenkiste,
Abstract summary: ECCO is a new video analytics framework designed for resource-efficient continuous learning.<n>By identifying cameras that experience similar drift and retraining a shared model for them, ECCO can substantially reduce the associated compute and communication costs.
Score: 8.965792494795787
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in video analytics address real-time data drift by continuously retraining specialized, lightweight DNN models for individual cameras. However, the current practice of retraining a separate model for each camera suffers from high compute and communication costs, making it unscalable. We present ECCO, a new video analytics framework designed for resource-efficient continuous learning. The key insight is that the data drift, which necessitates model retraining, often shows temporal and spatial correlations across nearby cameras. By identifying cameras that experience similar drift and retraining a shared model for them, ECCO can substantially reduce the associated compute and communication costs. Specifically, ECCO introduces: (i) a lightweight grouping algorithm that dynamically forms and updates camera groups; (ii) a GPU allocator that dynamically assigns GPU resources across different groups to improve retraining accuracy and ensure fairness; and (iii) a transmission controller at each camera that configures frame sampling and coordinates bandwidth sharing with other cameras based on its assigned GPU resources. We conducted extensive evaluations on three distinctive datasets for two vision tasks. Compared to leading baselines, ECCO improves retraining accuracy by 6.7%-18.1% using the same compute and communication resources, or supports 3.3 times more concurrent cameras at the same accuracy.

Related papers

EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization [17.622013322533423]
We introduce EVA02-AT, a suite of EVA02-based video-language foundation models tailored to egocentric video understanding tasks.<n> EVA02-AT efficiently transfers an image-based CLIP model into a unified video encoder via a single-stage pretraining.<n>We introduce the Symmetric Multi-Similarity (SMS) loss and a novel training framework that advances all soft labels for both positive and negative pairs.
arXiv Detail & Related papers (2025-06-17T09:51:51Z)
EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance [69.40274699401473]
We introduce EPiC, an efficient and precise camera control learning framework.<n>It constructs high-quality anchor videos without expensive camera trajectory annotations.<n>EPiC achieves SOTA performance on RealEstate10K and MiraData for I2V camera control task.
arXiv Detail & Related papers (2025-05-28T01:45:26Z)
Adapt3R: Adaptive 3D Scene Representation for Domain Transfer in Imitation Learning [28.80962812015936]
Imitation Learning can train robots to perform complex and diverse manipulation tasks, but learned policies are brittle with observations outside of the training distribution.<n>We propose Adapt3R, a general-purpose 3D observation encoder which synthesizes data from calibrated RGBD cameras into a vector that can be used as conditioning for arbitrary IL algorithms.<n>We show across 93 simulated and 6 real tasks that when trained end-to-end with a variety of IL algorithms, Adapt3R maintains these algorithms' learning capacity while enabling zero-shot transfer to novel embodiments and camera poses.
arXiv Detail & Related papers (2025-03-06T18:17:09Z)
Learning Online Policies for Person Tracking in Multi-View Environments [4.62316736194615]
We introduce MVSparse, a novel framework for cooperative multi-person tracking across multiple synchronized cameras. The MVSparse system is comprised of a carefully orchestrated pipeline, combining edge server-based models with distributed lightweight Reinforcement Learning (RL) agents. Notably, our contributions include an empirical analysis of multi-camera pedestrian tracking datasets, the development of a multi-camera, multi-person detection pipeline, and the implementation of MVSparse.
arXiv Detail & Related papers (2023-12-26T02:57:11Z)
Learning from One Continuous Video Stream [70.30084026960819]
We introduce a framework for online learning from a single continuous video stream. This poses great challenges given the high correlation between consecutive video frames. We employ pixel-to-pixel modelling as a practical and flexible way to switch between pre-training and single-stream evaluation.
arXiv Detail & Related papers (2023-12-01T14:03:30Z)
Structured Cooperative Learning with Graphical Model Priors [98.53322192624594]
We study how to train personalized models for different tasks on decentralized devices with limited local data. We propose "Structured Cooperative Learning (SCooL)", in which a cooperation graph across devices is generated by a graphical model. We evaluate SCooL and compare it with existing decentralized learning methods on an extensive set of benchmarks.
arXiv Detail & Related papers (2023-06-16T02:41:31Z)
A High-Accuracy Unsupervised Person Re-identification Method Using Auxiliary Information Mined from Datasets [53.047542904329866]
We make use of auxiliary information mined from datasets for multi-modal feature learning. This paper proposes three effective training tricks, including Restricted Label Smoothing Cross Entropy Loss (RLSCE), Weight Adaptive Triplet Loss (WATL) and Dynamic Training Iterations (DTI)
arXiv Detail & Related papers (2022-05-06T10:16:18Z)
Parallel Successive Learning for Dynamic Distributed Model Training over Heterogeneous Wireless Networks [50.68446003616802]
Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices. We develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions. Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning.
arXiv Detail & Related papers (2022-02-07T05:11:01Z)
Cross-Camera Feature Prediction for Intra-Camera Supervised Person Re-identification across Distant Scenes [70.30052164401178]
Person re-identification (Re-ID) aims to match person images across non-overlapping camera views. ICS-DS Re-ID uses cross-camera unpaired data with intra-camera identity labels for training. Cross-camera feature prediction method to mine cross-camera self supervision information. Joint learning of global-level and local-level features forms a global-local cross-camera feature prediction scheme.
arXiv Detail & Related papers (2021-07-29T11:27:50Z)
Self-Supervised Multi-Frame Monocular Scene Flow [61.588808225321735]
We introduce a multi-frame monocular scene flow network based on self-supervised learning. We observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.
arXiv Detail & Related papers (2021-05-05T17:49:55Z)
Dual-Triplet Metric Learning for Unsupervised Domain Adaptation in Video-Based Face Recognition [8.220945563455848]
A new deep domain adaptation (DA) method is proposed to adapt the CNN embedding of a Siamese network using unlabeled tracklets captured with a new video cameras. The proposed metric learning technique is used to train deep Siamese networks under different training scenarios.
arXiv Detail & Related papers (2020-02-11T05:06:30Z)
CONVINCE: Collaborative Cross-Camera Video Analytics at the Edge [1.5469452301122173]
This paper introduces CONVINCE, a new approach to look at cameras as a collective entity that enables collaborative video analytics pipeline among cameras. Our results demonstrate that CONVINCE achieves an object identification accuracy of $sim$91%, by transmitting only about $sim$25% of all the recorded frames.
arXiv Detail & Related papers (2020-02-05T23:55:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.