Large-Scale Video Analytics through Object-Level Consolidation
- URL: http://arxiv.org/abs/2111.15451v1
- Date: Tue, 30 Nov 2021 14:48:54 GMT
- Title: Large-Scale Video Analytics through Object-Level Consolidation
- Authors: Daniel Rivas, Francesc Guim, Jord\`a Polo, David Carrera
- Abstract summary: Video analytics enables new use cases, such as smart cities or autonomous driving.
Video analytics enables new use cases, such as smart cities or autonomous driving.
- Score: 1.299941371793082
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the number of installed cameras grows, so do the compute resources
required to process and analyze all the images captured by these cameras. Video
analytics enables new use cases, such as smart cities or autonomous driving. At
the same time, it urges service providers to install additional compute
resources to cope with the demand while the strict latency requirements push
compute towards the end of the network, forming a geographically distributed
and heterogeneous set of compute locations, shared and resource-constrained.
Such landscape (shared and distributed locations) forces us to design new
techniques that can optimize and distribute work among all available locations
and, ideally, make compute requirements grow sublinearly with respect to the
number of cameras installed. In this paper, we present FoMO (Focus on Moving
Objects). This method effectively optimizes multi-camera deployments by
preprocessing images for scenes, filtering the empty regions out, and composing
regions of interest from multiple cameras into a single image that serves as
input for a pre-trained object detection model. Results show that overall
system performance can be increased by 8x while accuracy improves 40% as a
by-product of the methodology, all using an off-the-shelf pre-trained model
with no additional training or fine-tuning.
Related papers
- Redundancy-Aware Camera Selection for Indoor Scene Neural Rendering [54.468355408388675]
We build a similarity matrix that incorporates both the spatial diversity of the cameras and the semantic variation of the images.
We apply a diversity-based sampling algorithm to optimize the camera selection.
We also develop a new dataset, IndoorTraj, which includes long and complex camera movements captured by humans in virtual indoor environments.
arXiv Detail & Related papers (2024-09-11T08:36:49Z) - KRONC: Keypoint-based Robust Camera Optimization for 3D Car Reconstruction [58.04846444985808]
This paper introduces KRONC, a novel approach aimed at inferring view poses by leveraging prior knowledge about the object to reconstruct and its representation through semantic keypoints.
With a focus on vehicle scenes, KRONC is able to estimate the position of the views as a solution to a light optimization problem targeting the convergence of keypoints' back-projections to a singular point.
arXiv Detail & Related papers (2024-09-09T08:08:05Z) - VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
We introduce a novel methodology that extends Pose Graph Optimization techniques.
We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step.
Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
arXiv Detail & Related papers (2024-03-25T17:47:03Z) - Enabling Cross-Camera Collaboration for Video Analytics on Distributed
Smart Cameras [7.609628915907225]
We present Argus, a distributed video analytics system with cross-camera collaboration on smart cameras.
We identify multi-camera, multi-target tracking as the primary task multi-camera video analytics and develop a novel technique that avoids redundant, processing-heavy tasks.
Argus reduces the number of object identifications and end-to-end latency by up to 7.13x and 2.19x compared to the state-of-the-art.
arXiv Detail & Related papers (2024-01-25T12:27:03Z) - Learning Online Policies for Person Tracking in Multi-View Environments [4.62316736194615]
We introduce MVSparse, a novel framework for cooperative multi-person tracking across multiple synchronized cameras.
The MVSparse system is comprised of a carefully orchestrated pipeline, combining edge server-based models with distributed lightweight Reinforcement Learning (RL) agents.
Notably, our contributions include an empirical analysis of multi-camera pedestrian tracking datasets, the development of a multi-camera, multi-person detection pipeline, and the implementation of MVSparse.
arXiv Detail & Related papers (2023-12-26T02:57:11Z) - Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - Homography Estimation in Complex Topological Scenes [6.023710971800605]
Surveillance videos and images are used for a broad set of applications, ranging from traffic analysis to crime detection.
Extrinsic camera calibration data is important for most analysis applications.
We present an automated camera-calibration process leveraging a dictionary-based approach that does not require prior knowledge on any camera settings.
arXiv Detail & Related papers (2023-08-02T11:31:43Z) - Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization
Using Satellite Image [91.29546868637911]
This paper addresses the problem of vehicle-mounted camera localization by matching a ground-level image with an overhead-view satellite map.
The key idea is to formulate the task as pose estimation and solve it by neural-net based optimization.
Experiments on standard autonomous vehicle localization datasets have confirmed the superiority of the proposed method.
arXiv Detail & Related papers (2022-04-10T19:16:58Z) - Towards Unsupervised Fine-Tuning for Edge Video Analytics [1.1091582432763736]
We propose a method for improving accuracy of edge models without any extra compute cost by means of automatic model specialization.
Results show that our method can automatically improve accuracy of pre-trained models by an average of 21%.
arXiv Detail & Related papers (2021-04-14T12:57:40Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.