Related papers: Real-time 3D Deep Multi-Camera Tracking

Real-time 3D Deep Multi-Camera Tracking

URL: http://arxiv.org/abs/2003.11753v1
Date: Thu, 26 Mar 2020 06:08:19 GMT
Title: Real-time 3D Deep Multi-Camera Tracking
Authors: Quanzeng You, Hao Jiang
Abstract summary: We propose a novel end-to-end tracking pipeline, Deep Multi-Camera Tracking (DMCT), which achieves reliable real-time multi-camera people tracking. Our system achieves the state-of-the-art tracking results while maintaining real-time performance.
Score: 13.494550690138775
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tracking a crowd in 3D using multiple RGB cameras is a challenging task. Most previous multi-camera tracking algorithms are designed for offline setting and have high computational complexity. Robust real-time multi-camera 3D tracking is still an unsolved problem. In this work, we propose a novel end-to-end tracking pipeline, Deep Multi-Camera Tracking (DMCT), which achieves reliable real-time multi-camera people tracking. Our DMCT consists of 1) a fast and novel perspective-aware Deep GroudPoint Network, 2) a fusion procedure for ground-plane occupancy heatmap estimation, 3) a novel Deep Glimpse Network for person detection and 4) a fast and accurate online tracker. Our design fully unleashes the power of deep neural network to estimate the "ground point" of each person in each color image, which can be optimized to run efficiently and robustly. Our fusion procedure, glimpse network and tracker merge the results from different views, find people candidates using multiple video frames and then track people on the fused heatmap. Our system achieves the state-of-the-art tracking results while maintaining real-time performance. Apart from evaluation on the challenging WILDTRACK dataset, we also collect two more tracking datasets with high-quality labels from two different environments and camera settings. Our experimental results confirm that our proposed real-time pipeline gives superior results to previous approaches.

Related papers

SpatialTrackerV2: 3D Point Tracking Made Easy [73.0350898700048]
SpatialTrackerV2 is a feed-forward 3D point tracking method for monocular videos.<n>It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion.<n>By learning geometry and motion jointly from such heterogeneous data, SpatialTrackerV2 outperforms existing 3D tracking methods by 30%.
arXiv Detail & Related papers (2025-07-16T17:59:03Z)
DELTA: Dense Efficient Long-range 3D Tracking for any video [82.26753323263009]
We introduce DELTA, a novel method that efficiently tracks every pixel in 3D space, enabling accurate motion estimation across entire videos. Our approach leverages a joint global-local attention mechanism for reduced-resolution tracking, followed by a transformer-based upsampler to achieve high-resolution predictions. Our method provides a robust solution for applications requiring fine-grained, long-term motion tracking in 3D space.
arXiv Detail & Related papers (2024-10-31T17:59:01Z)
RockTrack: A 3D Robust Multi-Camera-Ken Multi-Object Tracking Framework [28.359633046753228]
We propose RockTrack, a 3D MOT method for multi-camera detectors. RockTrack incorporates a confidence-guided preprocessing module to extract reliable motion and image observations. RockTrack achieves state-of-the-art performance on the nuScenes vision-only tracking leaderboard with 59.1% AMOTA.
arXiv Detail & Related papers (2024-09-18T07:08:08Z)
ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every Detection Box [81.45219802386444]
Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects across video frames. We propose a hierarchical data association strategy to mine the true objects in low-score detection boxes. In 3D scenarios, it is much easier for the tracker to predict object velocities in the world coordinate.
arXiv Detail & Related papers (2023-03-27T15:35:21Z)
Minkowski Tracker: A Sparse Spatio-Temporal R-CNN for Joint Object Detection and Tracking [53.64390261936975]
We present Minkowski Tracker, a sparse-temporal R-CNN that jointly solves object detection and tracking problems. Inspired by region-based CNN (R-CNN), we propose to track motion as a second stage of the object detector R-CNN. We show in large-scale experiments that the overall performance gain of our method is due to four factors.
arXiv Detail & Related papers (2022-08-22T04:47:40Z)
Scalable and Real-time Multi-Camera Vehicle Detection, Re-Identification, and Tracking [58.95210121654722]
We propose a real-time city-scale multi-camera vehicle tracking system that handles real-world, low-resolution CCTV instead of idealized and curated video streams. Our method is ranked among the top five performers on the public leaderboard.
arXiv Detail & Related papers (2022-04-15T12:47:01Z)
MMPTRACK: Large-scale Densely Annotated Multi-camera Multiple People Tracking Benchmark [40.363608495563305]
We provide a large-scale densely-labeled multi-camera tracking dataset in five different environments with the help of an auto-annotation system. The 3D tracking results are projected to each RGB camera view using camera parameters to create 2D tracking results. This dataset provides a more reliable benchmark of multi-camera, multi-object tracking systems in cluttered and crowded environments.
arXiv Detail & Related papers (2021-11-30T06:29:14Z)
LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking [42.87953709286856]
Multi-Camera Multi-Object Tracking is currently drawing attention in the computer vision field due to its superior performance in real-world applications. We propose a mathematically elegant multi-camera multiple object tracking approach based on a spatial-temporal lifted multicut formulation.
arXiv Detail & Related papers (2021-11-23T14:09:47Z)
CFTrack: Center-based Radar and Camera Fusion for 3D Multi-Object Tracking [9.62721286522053]
We propose an end-to-end network for joint object detection and tracking based on radar and camera sensor fusion. Our proposed method uses a center-based radar-camera fusion algorithm for object detection and utilizes a greedy algorithm for object association. We evaluate our method on the challenging nuScenes dataset, where it achieves 20.0 AMOTA and outperforms all vision-based 3D tracking methods in the benchmark.
arXiv Detail & Related papers (2021-07-11T23:56:53Z)
Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net [93.51773847125014]
We propose a novel deep neural network that is able to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D sensor. Our approach performs 3D convolutions across space and time over a bird's eye view representation of the 3D world.
arXiv Detail & Related papers (2020-12-22T22:43:35Z)
Tracking-by-Counting: Using Network Flows on Crowd Density Maps for Tracking Multiple Targets [96.98888948518815]
State-of-the-art multi-object tracking(MOT) methods follow the tracking-by-detection paradigm. We propose a new MOT paradigm, tracking-by-counting, tailored for crowded scenes.
arXiv Detail & Related papers (2020-07-18T19:51:53Z)
JRMOT: A Real-Time 3D Multi-Object Tracker and a New Large-Scale Dataset [34.609125601292]
We present JRMOT, a novel 3D MOT system that integrates information from RGB images and 3D point clouds to achieve real-time tracking performance. As part of our work, we release the JRDB dataset, a novel large scale 2D+3D dataset and benchmark. The presented 3D MOT system demonstrates state-of-the-art performance against competing methods on the popular 2D tracking KITTI benchmark.
arXiv Detail & Related papers (2020-02-19T19:21:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.