Online 3D Multi-Camera Perception through Robust 2D Tracking and Depth-based Late Aggregation
- URL: http://arxiv.org/abs/2509.09946v1
- Date: Fri, 12 Sep 2025 03:28:35 GMT
- Title: Online 3D Multi-Camera Perception through Robust 2D Tracking and Depth-based Late Aggregation
- Authors: Vu-Minh Le, Thao-Anh Tran, Duc Huy Do, Xuan Canh Do, Huong Ninh, Hai Tran,
- Abstract summary: Multi-Target Multi-Camera Tracking (MTMC) is an essential computer vision task for automating large-scale surveillance.<n>We present an approach for extending any online 2D multi-camera tracking system into 3D space by utilizing depth information.<n>The proposed framework is evaluated on the 2025 AI City Challenge's 3D MTMC dataset, achieving 3rd place on the leaderboard.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-Target Multi-Camera Tracking (MTMC) is an essential computer vision task for automating large-scale surveillance. With camera calibration and depth information, the targets in the scene can be projected into 3D space, offering unparalleled levels of automatic perception of a 3D environment. However, tracking in the 3D space requires replacing all 2D tracking components from the ground up, which may be infeasible for existing MTMC systems. In this paper, we present an approach for extending any online 2D multi-camera tracking system into 3D space by utilizing depth information to reconstruct a target in point-cloud space, and recovering its 3D box through clustering and yaw refinement following tracking. We also introduced an enhanced online data association mechanism that leverages the target's local ID consistency to assign global IDs across frames. The proposed framework is evaluated on the 2025 AI City Challenge's 3D MTMC dataset, achieving 3rd place on the leaderboard.
Related papers
- TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels [49.456225518469516]
Monocular 3D tracking aims to capture the long-term motion of pixels in 3D space from a single monocular video.<n>We propose TrackingWorld, a novel pipeline for dense 3D tracking of almost all pixels within a world-centric 3D coordinate system.
arXiv Detail & Related papers (2025-12-09T08:35:42Z) - SpatialTrackerV2: 3D Point Tracking Made Easy [73.0350898700048]
SpatialTrackerV2 is a feed-forward 3D point tracking method for monocular videos.<n>It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion.<n>By learning geometry and motion jointly from such heterogeneous data, SpatialTrackerV2 outperforms existing 3D tracking methods by 30%.
arXiv Detail & Related papers (2025-07-16T17:59:03Z) - TAPIP3D: Tracking Any Point in Persistent 3D Geometry [25.357437591411347]
We introduce TAPIP3D, a novel approach for long-term 3D point tracking in monocular and RGB-D videos.<n>TAPIP3D represents videos as camera-stabilized feature clouds, leveraging depth and camera motion information.<n>Our 3D-centric formulation significantly improves performance over existing 3D point tracking methods.
arXiv Detail & Related papers (2025-04-20T19:09:43Z) - DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction [65.46359561104867]
We target the challenge of online 2D and 3D point tracking from unposed monocular camera input.<n>We leverage 3D Gaussian splatting to reconstruct dynamic scenes in an online fashion.<n>We aim to inspire the community to advance online point tracking and reconstruction, expanding the applicability to diverse real-world scenarios.
arXiv Detail & Related papers (2024-09-03T17:58:03Z) - Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud
Pre-training [65.75399500494343]
Masked Autoencoders (MAE) have shown promising performance in self-supervised learning for 2D and 3D computer vision.
We propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training.
arXiv Detail & Related papers (2023-02-27T17:56:18Z) - 3D Visual Tracking Framework with Deep Learning for Asteroid Exploration [22.808962211830675]
In this paper, we focus on the studied accurate and real-time method for 3D tracking.
A new large-scale 3D asteroid tracking dataset is presented, including binocular video sequences, depth maps, and point clouds of diverse asteroids.
We propose a deep-learning based 3D tracking framework, named as Track3D, which involves 2D monocular tracker and a novel light-weight amodal axis-aligned bounding-box network, A3BoxNet.
arXiv Detail & Related papers (2021-11-21T04:14:45Z) - PC-DAN: Point Cloud based Deep Affinity Network for 3D Multi-Object
Tracking (Accepted as an extended abstract in JRDB-ACT Workshop at CVPR21) [68.12101204123422]
A point cloud is a dense compilation of spatial data in 3D coordinates.
We propose a PointNet-based approach for 3D Multi-Object Tracking (MOT)
arXiv Detail & Related papers (2021-06-03T05:36:39Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - 1st Place Solutions for Waymo Open Dataset Challenges -- 2D and 3D
Tracking [7.807118356899879]
HorizonMOT is proposed for camera-based 2D tracking in the image space and LiDAR-based 3D tracking in the 3D world space.
Within the tracking-by-detection paradigm, our trackers leverage our high-performing detectors used in the 2D/3D detection challenges and achieved 45.13% 2D MOTA/L2 and 63.45% 3D MOTA/L2.
arXiv Detail & Related papers (2020-06-28T04:49:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.