Related papers: Large-Scale Video Analytics through Object-Level Consolidation

Large-Scale Video Analytics through Object-Level Consolidation

URL: http://arxiv.org/abs/2111.15451v1
Date: Tue, 30 Nov 2021 14:48:54 GMT
Title: Large-Scale Video Analytics through Object-Level Consolidation
Authors: Daniel Rivas, Francesc Guim, Jord\`a Polo, David Carrera
Abstract summary: Video analytics enables new use cases, such as smart cities or autonomous driving. Video analytics enables new use cases, such as smart cities or autonomous driving.
Score: 1.299941371793082
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As the number of installed cameras grows, so do the compute resources required to process and analyze all the images captured by these cameras. Video analytics enables new use cases, such as smart cities or autonomous driving. At the same time, it urges service providers to install additional compute resources to cope with the demand while the strict latency requirements push compute towards the end of the network, forming a geographically distributed and heterogeneous set of compute locations, shared and resource-constrained. Such landscape (shared and distributed locations) forces us to design new techniques that can optimize and distribute work among all available locations and, ideally, make compute requirements grow sublinearly with respect to the number of cameras installed. In this paper, we present FoMO (Focus on Moving Objects). This method effectively optimizes multi-camera deployments by preprocessing images for scenes, filtering the empty regions out, and composing regions of interest from multiple cameras into a single image that serves as input for a pre-trained object detection model. Results show that overall system performance can be increased by 8x while accuracy improves 40% as a by-product of the methodology, all using an off-the-shelf pre-trained model with no additional training or fine-tuning.

Related papers

Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition [82.75714185083383]
This paper investigates the phenomenon of data redundancy in video understanding, with the aim to improve computational efficiency. Motivated by this phenomenon, we introduce a spatially adaptive video recognition approach, termed AdaFocus. Our resulting framework, Uni-AdaFocus, establishes a comprehensive framework that integrates seamlessly spatial, temporal, and sample-wise dynamic computation.
arXiv Detail & Related papers (2024-12-15T15:51:44Z)
One Diffusion to Generate Them All [54.82732533013014]
OneDiffusion is a versatile, large-scale diffusion model that supports bidirectional image synthesis and understanding. It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps. OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs.
arXiv Detail & Related papers (2024-11-25T12:11:05Z)
Large-scale Remote Sensing Image Target Recognition and Automatic Annotation [0.0]
This paper presents a method for object recognition and automatic labeling in large-area remote sensing images called LRSAA. The method integrates YOLOv11 and MobileNetV3-SSD object detection algorithms through ensemble learning to enhance model performance.
arXiv Detail & Related papers (2024-11-12T13:57:13Z)
Redundancy-Aware Camera Selection for Indoor Scene Neural Rendering [54.468355408388675]
We build a similarity matrix that incorporates both the spatial diversity of the cameras and the semantic variation of the images. We apply a diversity-based sampling algorithm to optimize the camera selection. We also develop a new dataset, IndoorTraj, which includes long and complex camera movements captured by humans in virtual indoor environments.
arXiv Detail & Related papers (2024-09-11T08:36:49Z)
VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
We introduce a novel methodology that extends Pose Graph Optimization techniques. We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step. Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
arXiv Detail & Related papers (2024-03-25T17:47:03Z)
Enabling Cross-Camera Collaboration for Video Analytics on Distributed Smart Cameras [7.609628915907225]
We present Argus, a distributed video analytics system with cross-camera collaboration on smart cameras. We identify multi-camera, multi-target tracking as the primary task multi-camera video analytics and develop a novel technique that avoids redundant, processing-heavy tasks. Argus reduces the number of object identifications and end-to-end latency by up to 7.13x and 2.19x compared to the state-of-the-art.
arXiv Detail & Related papers (2024-01-25T12:27:03Z)
Learning Online Policies for Person Tracking in Multi-View Environments [4.62316736194615]
We introduce MVSparse, a novel framework for cooperative multi-person tracking across multiple synchronized cameras. The MVSparse system is comprised of a carefully orchestrated pipeline, combining edge server-based models with distributed lightweight Reinforcement Learning (RL) agents. Notably, our contributions include an empirical analysis of multi-camera pedestrian tracking datasets, the development of a multi-camera, multi-person detection pipeline, and the implementation of MVSparse.
arXiv Detail & Related papers (2023-12-26T02:57:11Z)
Learning Robust Multi-Scale Representation for Neural Radiance Fields from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision. The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z)
Homography Estimation in Complex Topological Scenes [6.023710971800605]
Surveillance videos and images are used for a broad set of applications, ranging from traffic analysis to crime detection. Extrinsic camera calibration data is important for most analysis applications. We present an automated camera-calibration process leveraging a dictionary-based approach that does not require prior knowledge on any camera settings.
arXiv Detail & Related papers (2023-08-02T11:31:43Z)
Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image [91.29546868637911]
This paper addresses the problem of vehicle-mounted camera localization by matching a ground-level image with an overhead-view satellite map. The key idea is to formulate the task as pose estimation and solve it by neural-net based optimization. Experiments on standard autonomous vehicle localization datasets have confirmed the superiority of the proposed method.
arXiv Detail & Related papers (2022-04-10T19:16:58Z)
Towards Unsupervised Fine-Tuning for Edge Video Analytics [1.1091582432763736]
We propose a method for improving accuracy of edge models without any extra compute cost by means of automatic model specialization. Results show that our method can automatically improve accuracy of pre-trained models by an average of 21%.
arXiv Detail & Related papers (2021-04-14T12:57:40Z)
Self-supervised Human Detection and Segmentation via Multi-view Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training. We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.