Learning Online Policies for Person Tracking in Multi-View Environments
- URL: http://arxiv.org/abs/2312.15858v1
- Date: Tue, 26 Dec 2023 02:57:11 GMT
- Title: Learning Online Policies for Person Tracking in Multi-View Environments
- Authors: Keivan Nalaie, Rong Zheng
- Abstract summary: We introduce MVSparse, a novel framework for cooperative multi-person tracking across multiple synchronized cameras.
The MVSparse system is comprised of a carefully orchestrated pipeline, combining edge server-based models with distributed lightweight Reinforcement Learning (RL) agents.
Notably, our contributions include an empirical analysis of multi-camera pedestrian tracking datasets, the development of a multi-camera, multi-person detection pipeline, and the implementation of MVSparse.
- Score: 4.62316736194615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce MVSparse, a novel and efficient framework for
cooperative multi-person tracking across multiple synchronized cameras. The
MVSparse system is comprised of a carefully orchestrated pipeline, combining
edge server-based models with distributed lightweight Reinforcement Learning
(RL) agents operating on individual cameras. These RL agents intelligently
select informative blocks within each frame based on historical camera data and
detection outcomes from neighboring cameras, significantly reducing
computational load and communication overhead. The edge server aggregates
multiple camera views to perform detection tasks and provides feedback to the
individual agents. By projecting inputs from various perspectives onto a common
ground plane and applying deep detection models, MVSparse optimally leverages
temporal and spatial redundancy in multi-view videos. Notably, our
contributions include an empirical analysis of multi-camera pedestrian tracking
datasets, the development of a multi-camera, multi-person detection pipeline,
and the implementation of MVSparse, yielding impressive results on both open
datasets and real-world scenarios. Experimentally, MVSparse accelerates overall
inference time by 1.88X and 1.60X compared to a baseline approach while only
marginally compromising tracking accuracy by 2.27% and 3.17%, respectively,
showcasing its promising potential for efficient multi-camera tracking
applications.
Related papers
- MCTR: Multi Camera Tracking Transformer [45.66952089591361]
Multi-Camera Tracking tRansformer (MCTR) is a novel end-to-end approach tailored for multi-object detection and tracking across multiple cameras.
MCTR leverages end-to-end detectors like DEtector TRansformer (DETR) to produce detections and detection embeddings independently for each camera view.
The framework maintains set of track embeddings that encaplusate global information about the tracked objects, and updates them at every frame by integrating local information from the view-specific detection embeddings.
arXiv Detail & Related papers (2024-08-23T17:37:03Z) - MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark [63.878793340338035]
Multi-target multi-camera tracking is a crucial task that involves identifying and tracking individuals over time using video streams from multiple cameras.
Existing datasets for this task are either synthetically generated or artificially constructed within a controlled camera network setting.
We present MTMMC, a real-world, large-scale dataset that includes long video sequences captured by 16 multi-modal cameras in two different environments.
arXiv Detail & Related papers (2024-03-29T15:08:37Z) - Enabling Cross-Camera Collaboration for Video Analytics on Distributed
Smart Cameras [7.609628915907225]
We present Argus, a distributed video analytics system with cross-camera collaboration on smart cameras.
We identify multi-camera, multi-target tracking as the primary task multi-camera video analytics and develop a novel technique that avoids redundant, processing-heavy tasks.
Argus reduces the number of object identifications and end-to-end latency by up to 7.13x and 2.19x compared to the state-of-the-art.
arXiv Detail & Related papers (2024-01-25T12:27:03Z) - Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking [55.13878429987136]
We propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets.
Our method has achieved significant improvements on MOT17 and MOT20 datasets while reaching state-of-the-art performance on DanceTrack dataset.
arXiv Detail & Related papers (2023-11-17T08:17:49Z) - Enhancing Multi-Camera People Tracking with Anchor-Guided Clustering and
Spatio-Temporal Consistency ID Re-Assignment [22.531044994763487]
We propose a novel multi-camera multiple people tracking method that uses anchor clustering-guided for cross-camera reassigning.
Our approach aims to improve accuracy of tracking by identifying key features that are unique to every individual.
The method has demonstrated robustness and effectiveness in handling both synthetic and real-world data.
arXiv Detail & Related papers (2023-04-19T07:38:15Z) - Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D
Pose Estimation Tracking and Forecasting on a Video Snippet [24.852728097115744]
Multi-person pose understanding from RGB involves three complex tasks: pose estimation, tracking and motion forecasting.
Most existing works either focus on a single task or employ multi-stage approaches to solving multiple tasks separately.
We propose Snipper, a unified framework to perform multi-person 3D pose estimation, tracking, and motion forecasting simultaneously in a single stage.
arXiv Detail & Related papers (2022-07-09T18:42:14Z) - Scalable and Real-time Multi-Camera Vehicle Detection,
Re-Identification, and Tracking [58.95210121654722]
We propose a real-time city-scale multi-camera vehicle tracking system that handles real-world, low-resolution CCTV instead of idealized and curated video streams.
Our method is ranked among the top five performers on the public leaderboard.
arXiv Detail & Related papers (2022-04-15T12:47:01Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.