Related papers: 3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking

3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking

URL: http://arxiv.org/abs/2308.06635v1
Date: Sat, 12 Aug 2023 19:19:58 GMT
Title: 3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking
Authors: Shuxiao Ding, Eike Rehder, Lukas Schneider, Marius Cordts and Juergen Gall
Abstract summary: State-of-the-art 3D multi-object tracking (MOT) approaches typically rely on non-learned model-based algorithms such as Kalman Filter. We propose 3DMOTFormer, a learned geometry-based 3D MOT framework building upon the transformer architecture. Our approach achieves 71.2% and 68.2% AMOTA on the nuScenes validation and test split, respectively.
Score: 15.330384668966806
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Tracking 3D objects accurately and consistently is crucial for autonomous vehicles, enabling more reliable downstream tasks such as trajectory prediction and motion planning. Based on the substantial progress in object detection in recent years, the tracking-by-detection paradigm has become a popular choice due to its simplicity and efficiency. State-of-the-art 3D multi-object tracking (MOT) approaches typically rely on non-learned model-based algorithms such as Kalman Filter but require many manually tuned parameters. On the other hand, learning-based approaches face the problem of adapting the training to the online setting, leading to inevitable distribution mismatch between training and inference as well as suboptimal performance. In this work, we propose 3DMOTFormer, a learned geometry-based 3D MOT framework building upon the transformer architecture. We use an Edge-Augmented Graph Transformer to reason on the track-detection bipartite graph frame-by-frame and conduct data association via edge classification. To reduce the distribution mismatch between training and inference, we propose a novel online training strategy with an autoregressive and recurrent forward pass as well as sequential batch optimization. Using CenterPoint detections, our approach achieves 71.2% and 68.2% AMOTA on the nuScenes validation and test split, respectively. In addition, a trained 3DMOTFormer model generalizes well across different object detectors. Code is available at: https://github.com/dsx0511/3DMOTFormer.

Related papers

Lightweight LiDAR-Camera 3D Dynamic Object Detection and Multi-Class Trajectory Prediction [7.415417400188903]
Service mobile robots are often required to avoid dynamic objects while performing their tasks. We present a lightweight multi-modal framework for 3D object detection and trajectory prediction. Our system integrates LiDAR and camera inputs to achieve real-time perception of pedestrians, vehicles, and riders in 3D space.
arXiv Detail & Related papers (2025-04-18T11:59:34Z)
You Only Need Two Detectors to Achieve Multi-Modal 3D Multi-Object Tracking [9.20064374262956]
The proposed framework can achieve robust tracking by using only a 2D detector and a 3D detector. It is proven more accurate than many of the state-of-the-art TBD-based multi-modal tracking methods.
arXiv Detail & Related papers (2023-04-18T02:45:18Z)
GOOD: General Optimization-based Fusion for 3D Object Detection via LiDAR-Camera Object Candidates [10.534984939225014]
3D object detection serves as the core basis of the perception tasks in autonomous driving. Good is a general optimization-based fusion framework that can achieve satisfying detection without training additional models. Experiments on both nuScenes and KITTI datasets are carried out and the results show that GOOD outperforms by 9.1% on mAP score compared with PointPillars.
arXiv Detail & Related papers (2023-03-17T07:05:04Z)
Weakly Supervised Monocular 3D Object Detection using Multi-View Projection and Direction Consistency [78.76508318592552]
Monocular 3D object detection has become a mainstream approach in automatic driving for its easy application. Most current methods still rely on 3D point cloud data for labeling the ground truths used in the training phase. We propose a new weakly supervised monocular 3D objection detection method, which can train the model with only 2D labels marked on images.
arXiv Detail & Related papers (2023-03-15T15:14:00Z)
3D Multi-Object Tracking Using Graph Neural Networks with Cross-Edge Modality Attention [9.150245363036165]
Batch3DMOT represents real-world scenes as directed, acyclic, and category-disjoint tracking graphs. We present a multi-modal graph neural network that uses a cross-edge attention mechanism mitigating modality intermittence.
arXiv Detail & Related papers (2022-03-21T12:44:17Z)
LocATe: End-to-end Localization of Actions in 3D with Transformers [91.28982770522329]
LocATe is an end-to-end approach that jointly localizes and recognizes actions in a 3D sequence. Unlike transformer-based object-detection and classification models which consider image or patch features as input, LocATe's transformer model is capable of capturing long-term correlations between actions in a sequence. We introduce a new, challenging, and more realistic benchmark dataset, BABEL-TAL-20 (BT20), where the performance of state-of-the-art methods is significantly worse.
arXiv Detail & Related papers (2022-03-21T03:35:32Z)
Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem. We employ a Neural Message Passing network for data association that is fully trainable. We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z)
Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving. We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z)
ST3D: Self-training for Unsupervised Domain Adaptation on 3D ObjectDetection [78.71826145162092]
We present a new domain adaptive self-training pipeline, named ST3D, for unsupervised domain adaptation on 3D object detection from point clouds. Our ST3D achieves state-of-the-art performance on all evaluated datasets and even surpasses fully supervised results on KITTI 3D object detection benchmark.
arXiv Detail & Related papers (2021-03-09T10:51:24Z)
A two-stage data association approach for 3D Multi-object Tracking [0.0]
We adapt a two-stage dataassociation method which was successful in image-based tracking to the 3D setting. Our method outperforms the baseline using one-stagebipartie matching for data association by achieving 0.587 AMOTA in NuScenes validation set.
arXiv Detail & Related papers (2021-01-21T15:50:17Z)
PerMO: Perceiving More at Once from a Single Image for Autonomous Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image. Our approach combines the strengths of deep learning and the elegance of traditional techniques. We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.