Large-Scale Video Analytics through Object-Level Consolidation
- URL: http://arxiv.org/abs/2111.15451v1
- Date: Tue, 30 Nov 2021 14:48:54 GMT
- Title: Large-Scale Video Analytics through Object-Level Consolidation
- Authors: Daniel Rivas, Francesc Guim, Jord\`a Polo, David Carrera
- Abstract summary: Video analytics enables new use cases, such as smart cities or autonomous driving.
Video analytics enables new use cases, such as smart cities or autonomous driving.
- Score: 1.299941371793082
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the number of installed cameras grows, so do the compute resources
required to process and analyze all the images captured by these cameras. Video
analytics enables new use cases, such as smart cities or autonomous driving. At
the same time, it urges service providers to install additional compute
resources to cope with the demand while the strict latency requirements push
compute towards the end of the network, forming a geographically distributed
and heterogeneous set of compute locations, shared and resource-constrained.
Such landscape (shared and distributed locations) forces us to design new
techniques that can optimize and distribute work among all available locations
and, ideally, make compute requirements grow sublinearly with respect to the
number of cameras installed. In this paper, we present FoMO (Focus on Moving
Objects). This method effectively optimizes multi-camera deployments by
preprocessing images for scenes, filtering the empty regions out, and composing
regions of interest from multiple cameras into a single image that serves as
input for a pre-trained object detection model. Results show that overall
system performance can be increased by 8x while accuracy improves 40% as a
by-product of the methodology, all using an off-the-shelf pre-trained model
with no additional training or fine-tuning.
Related papers
- One Diffusion to Generate Them All [54.82732533013014]
OneDiffusion is a versatile, large-scale diffusion model that supports bidirectional image synthesis and understanding.
It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps.
OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs.
arXiv Detail & Related papers (2024-11-25T12:11:05Z) - Large-scale Remote Sensing Image Target Recognition and Automatic Annotation [0.0]
This paper presents a method for object recognition and automatic labeling in large-area remote sensing images called LRSAA.
The method integrates YOLOv11 and MobileNetV3-SSD object detection algorithms through ensemble learning to enhance model performance.
arXiv Detail & Related papers (2024-11-12T13:57:13Z) - Redundancy-Aware Camera Selection for Indoor Scene Neural Rendering [54.468355408388675]
We build a similarity matrix that incorporates both the spatial diversity of the cameras and the semantic variation of the images.
We apply a diversity-based sampling algorithm to optimize the camera selection.
We also develop a new dataset, IndoorTraj, which includes long and complex camera movements captured by humans in virtual indoor environments.
arXiv Detail & Related papers (2024-09-11T08:36:49Z) - VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
We introduce a novel methodology that extends Pose Graph Optimization techniques.
We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step.
Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
arXiv Detail & Related papers (2024-03-25T17:47:03Z) - Enabling Cross-Camera Collaboration for Video Analytics on Distributed
Smart Cameras [7.609628915907225]
We present Argus, a distributed video analytics system with cross-camera collaboration on smart cameras.
We identify multi-camera, multi-target tracking as the primary task multi-camera video analytics and develop a novel technique that avoids redundant, processing-heavy tasks.
Argus reduces the number of object identifications and end-to-end latency by up to 7.13x and 2.19x compared to the state-of-the-art.
arXiv Detail & Related papers (2024-01-25T12:27:03Z) - Learning Online Policies for Person Tracking in Multi-View Environments [4.62316736194615]
We introduce MVSparse, a novel framework for cooperative multi-person tracking across multiple synchronized cameras.
The MVSparse system is comprised of a carefully orchestrated pipeline, combining edge server-based models with distributed lightweight Reinforcement Learning (RL) agents.
Notably, our contributions include an empirical analysis of multi-camera pedestrian tracking datasets, the development of a multi-camera, multi-person detection pipeline, and the implementation of MVSparse.
arXiv Detail & Related papers (2023-12-26T02:57:11Z) - Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization
Using Satellite Image [91.29546868637911]
This paper addresses the problem of vehicle-mounted camera localization by matching a ground-level image with an overhead-view satellite map.
The key idea is to formulate the task as pose estimation and solve it by neural-net based optimization.
Experiments on standard autonomous vehicle localization datasets have confirmed the superiority of the proposed method.
arXiv Detail & Related papers (2022-04-10T19:16:58Z) - Towards Unsupervised Fine-Tuning for Edge Video Analytics [1.1091582432763736]
We propose a method for improving accuracy of edge models without any extra compute cost by means of automatic model specialization.
Results show that our method can automatically improve accuracy of pre-trained models by an average of 21%.
arXiv Detail & Related papers (2021-04-14T12:57:40Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.