Track Initialization and Re-Identification for~3D Multi-View Multi-Object Tracking
- URL: http://arxiv.org/abs/2405.18606v1
- Date: Tue, 28 May 2024 21:36:16 GMT
- Title: Track Initialization and Re-Identification for~3D Multi-View Multi-Object Tracking
- Authors: Linh Van Ma, Tran Thien Dat Nguyen, Ba-Ngu Vo, Hyunsung Jang, Moongu Jeon,
- Abstract summary: We propose a 3D multi-object tracking (MOT) solution using only 2D detections from monocular cameras.
We exploit the 2D detections and extracted features from multiple cameras to provide a better approximation of the multi-object filtering density.
- Score: 12.389483990547223
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a 3D multi-object tracking (MOT) solution using only 2D detections from monocular cameras, which automatically initiates/terminates tracks as well as resolves track appearance-reappearance and occlusions. Moreover, this approach does not require detector retraining when cameras are reconfigured but only the camera matrices of reconfigured cameras need to be updated. Our approach is based on a Bayesian multi-object formulation that integrates track initiation/termination, re-identification, occlusion handling, and data association into a single Bayes filtering recursion. However, the exact filter that utilizes all these functionalities is numerically intractable due to the exponentially growing number of terms in the (multi-object) filtering density, while existing approximations trade-off some of these functionalities for speed. To this end, we develop a more efficient approximation suitable for online MOT by incorporating object features and kinematics into the measurement model, which improves data association and subsequently reduces the number of terms. Specifically, we exploit the 2D detections and extracted features from multiple cameras to provide a better approximation of the multi-object filtering density to realize the track initiation/termination and re-identification functionalities. Further, incorporating a tractable geometric occlusion model based on 2D projections of 3D objects on the camera planes realizes the occlusion handling functionality of the filter. Evaluation of the proposed solution on challenging datasets demonstrates significant improvements and robustness when camera configurations change on-the-fly, compared to existing multi-view MOT solutions. The source code is publicly available at https://github.com/linh-gist/mv-glmb-ab.
Related papers
- CAMO-MOT: Combined Appearance-Motion Optimization for 3D Multi-Object
Tracking with Camera-LiDAR Fusion [34.42289908350286]
3D Multi-object tracking (MOT) ensures consistency during continuous dynamic detection.
It can be challenging to accurately track the irregular motion of objects for LiDAR-based methods.
We propose a novel camera-LiDAR fusion 3D MOT framework based on the Combined Appearance-Motion Optimization (CAMO-MOT)
arXiv Detail & Related papers (2022-09-06T14:41:38Z) - Scatter Points in Space: 3D Detection from Multi-view Monocular Images [8.71944437852952]
3D object detection from monocular image(s) is a challenging and long-standing problem of computer vision.
Recent methods tend to aggregate multiview feature by sampling regular 3D grid densely in space.
We propose a learnable keypoints sampling method, which scatters pseudo surface points in 3D space, in order to keep data sparsity.
arXiv Detail & Related papers (2022-08-31T09:38:05Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - DeepFusionMOT: A 3D Multi-Object Tracking Framework Based on
Camera-LiDAR Fusion with Deep Association [8.34219107351442]
This paper proposes a robust camera-LiDAR fusion-based MOT method that achieves a good trade-off between accuracy and speed.
Our proposed method presents obvious advantages over the state-of-the-art MOT methods in terms of both tracking accuracy and processing speed.
arXiv Detail & Related papers (2022-02-24T13:36:29Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - Relation3DMOT: Exploiting Deep Affinity for 3D Multi-Object Tracking
from View Aggregation [8.854112907350624]
3D multi-object tracking plays a vital role in autonomous navigation.
Many approaches detect objects in 2D RGB sequences for tracking, which is lack of reliability when localizing objects in 3D space.
We propose a novel convolutional operation, named RelationConv, to better exploit the correlation between each pair of objects in the adjacent frames.
arXiv Detail & Related papers (2020-11-25T16:14:40Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - A Bayesian Filter for Multi-view 3D Multi-object Tracking with Occlusion
Handling [2.824395407508717]
The proposed algorithm has a linear complexity in the total number of detections across the cameras.
It operates in the 3D world frame, and provides 3D trajectory estimates of the objects.
The proposed algorithm is evaluated on the latest WILDTRACKS dataset, and demonstrated to work in very crowded scenes.
arXiv Detail & Related papers (2020-01-13T09:34:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.