1st Place Solutions for Waymo Open Dataset Challenges -- 2D and 3D
Tracking
- URL: http://arxiv.org/abs/2006.15506v1
- Date: Sun, 28 Jun 2020 04:49:59 GMT
- Title: 1st Place Solutions for Waymo Open Dataset Challenges -- 2D and 3D
Tracking
- Authors: Yu Wang, Sijia Chen, Li Huang, Runzhou Ge, Yihan Hu, Zhuangzhuang
Ding, Jie Liao
- Abstract summary: HorizonMOT is proposed for camera-based 2D tracking in the image space and LiDAR-based 3D tracking in the 3D world space.
Within the tracking-by-detection paradigm, our trackers leverage our high-performing detectors used in the 2D/3D detection challenges and achieved 45.13% 2D MOTA/L2 and 63.45% 3D MOTA/L2.
- Score: 7.807118356899879
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This technical report presents the online and real-time 2D and 3D
multi-object tracking (MOT) algorithms that reached the 1st places on both
Waymo Open Dataset 2D tracking and 3D tracking challenges. An efficient and
pragmatic online tracking-by-detection framework named HorizonMOT is proposed
for camera-based 2D tracking in the image space and LiDAR-based 3D tracking in
the 3D world space. Within the tracking-by-detection paradigm, our trackers
leverage our high-performing detectors used in the 2D/3D detection challenges
and achieved 45.13% 2D MOTA/L2 and 63.45% 3D MOTA/L2 in the 2D/3D tracking
challenges.
Related papers
- Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels [67.36972154532761]
Estimating the 3D trajectory of every pixel from a monocular video is crucial and promising for a comprehensive understanding of the 3D dynamics of videos.<n>Recent monocular 3D tracking works demonstrate impressive performance, but are limited to either tracking sparse points on the first frame or a slow optimization-based framework for dense tracking.<n>We propose a feedforward model, called Track4World, enabling an efficient holistic 3D tracking of every pixel in the world-centric coordinate system.
arXiv Detail & Related papers (2026-03-03T03:45:43Z) - TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels [49.456225518469516]
Monocular 3D tracking aims to capture the long-term motion of pixels in 3D space from a single monocular video.<n>We propose TrackingWorld, a novel pipeline for dense 3D tracking of almost all pixels within a world-centric 3D coordinate system.
arXiv Detail & Related papers (2025-12-09T08:35:42Z) - Online 3D Multi-Camera Perception through Robust 2D Tracking and Depth-based Late Aggregation [0.0]
Multi-Target Multi-Camera Tracking (MTMC) is an essential computer vision task for automating large-scale surveillance.<n>We present an approach for extending any online 2D multi-camera tracking system into 3D space by utilizing depth information.<n>The proposed framework is evaluated on the 2025 AI City Challenge's 3D MTMC dataset, achieving 3rd place on the leaderboard.
arXiv Detail & Related papers (2025-09-12T03:28:35Z) - SpatialTrackerV2: 3D Point Tracking Made Easy [73.0350898700048]
SpatialTrackerV2 is a feed-forward 3D point tracking method for monocular videos.<n>It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion.<n>By learning geometry and motion jointly from such heterogeneous data, SpatialTrackerV2 outperforms existing 3D tracking methods by 30%.
arXiv Detail & Related papers (2025-07-16T17:59:03Z) - TAPIP3D: Tracking Any Point in Persistent 3D Geometry [25.357437591411347]
We introduce TAPIP3D, a novel approach for long-term 3D point tracking in monocular and RGB-D videos.
TAPIP3D represents videos as camera-stabilized feature clouds, leveraging depth and camera motion information.
Our results demonstrate that compensating for camera motion improves tracking performance.
arXiv Detail & Related papers (2025-04-20T19:09:43Z) - Street Gaussians without 3D Object Tracker [86.62329193275916]
Existing methods rely on labor-intensive manual labeling of object poses to reconstruct dynamic objects in canonical space.
We propose a stable object tracking module by leveraging associations from 2D deep trackers within a 3D object fusion strategy.
We address inevitable tracking errors by further introducing a motion learning strategy in an implicit feature space that autonomously corrects trajectory errors and recovers missed detections.
arXiv Detail & Related papers (2024-12-07T05:49:42Z) - BiTrack: Bidirectional Offline 3D Multi-Object Tracking Using Camera-LiDAR Data [11.17376076195671]
"BiTrack" is a 3D OMOT framework that includes modules of 2D-3D detection fusion, initial trajectory generation, and bidirectional trajectory re-optimization.
The experiment results on the KITTI dataset demonstrate that BiTrack achieves the state-of-the-art performance for 3D OMOT tasks in terms of accuracy and efficiency.
arXiv Detail & Related papers (2024-06-26T15:09:54Z) - SpatialTracker: Tracking Any 2D Pixels in 3D Space [71.58016288648447]
We propose to estimate point trajectories in 3D space to mitigate the issues caused by image projection.
Our method, named SpatialTracker, lifts 2D pixels to 3D using monocular depth estimators.
Tracking in 3D allows us to leverage as-rigid-as-possible (ARAP) constraints while simultaneously learning a rigidity embedding that clusters pixels into different rigid parts.
arXiv Detail & Related papers (2024-04-05T17:59:25Z) - Tracking Objects with 3D Representation from Videos [57.641129788552675]
We propose a new 2D Multiple Object Tracking paradigm, called P3DTrack.
With 3D object representation learning from Pseudo 3D object labels in monocular videos, we propose a new 2D MOT paradigm, called P3DTrack.
arXiv Detail & Related papers (2023-06-08T17:58:45Z) - ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every
Detection Box [81.45219802386444]
Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects across video frames.
We propose a hierarchical data association strategy to mine the true objects in low-score detection boxes.
In 3D scenarios, it is much easier for the tracker to predict object velocities in the world coordinate.
arXiv Detail & Related papers (2023-03-27T15:35:21Z) - A Lightweight and Detector-free 3D Single Object Tracker on Point Clouds [50.54083964183614]
It is non-trivial to perform accurate target-specific detection since the point cloud of objects in raw LiDAR scans is usually sparse and incomplete.
We propose DMT, a Detector-free Motion prediction based 3D Tracking network that totally removes the usage of complicated 3D detectors.
arXiv Detail & Related papers (2022-03-08T17:49:07Z) - 3D Visual Tracking Framework with Deep Learning for Asteroid Exploration [22.808962211830675]
In this paper, we focus on the studied accurate and real-time method for 3D tracking.
A new large-scale 3D asteroid tracking dataset is presented, including binocular video sequences, depth maps, and point clouds of diverse asteroids.
We propose a deep-learning based 3D tracking framework, named as Track3D, which involves 2D monocular tracker and a novel light-weight amodal axis-aligned bounding-box network, A3BoxNet.
arXiv Detail & Related papers (2021-11-21T04:14:45Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - F-Siamese Tracker: A Frustum-based Double Siamese Network for 3D Single
Object Tracking [12.644452175343059]
A main challenge in 3D single object tracking is how to reduce search space for generating appropriate 3D candidates.
Instead of relying on 3D proposals, we produce 2D region proposals which are then extruded into 3D viewing frustums.
We perform an online accuracy validation on the 3D frustum to generate refined point cloud searching space.
arXiv Detail & Related papers (2020-10-22T08:01:17Z) - JRMOT: A Real-Time 3D Multi-Object Tracker and a New Large-Scale Dataset [34.609125601292]
We present JRMOT, a novel 3D MOT system that integrates information from RGB images and 3D point clouds to achieve real-time tracking performance.
As part of our work, we release the JRDB dataset, a novel large scale 2D+3D dataset and benchmark.
The presented 3D MOT system demonstrates state-of-the-art performance against competing methods on the popular 2D tracking KITTI benchmark.
arXiv Detail & Related papers (2020-02-19T19:21:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.