3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking
- URL: http://arxiv.org/abs/2308.06635v1
- Date: Sat, 12 Aug 2023 19:19:58 GMT
- Title: 3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking
- Authors: Shuxiao Ding, Eike Rehder, Lukas Schneider, Marius Cordts and Juergen
Gall
- Abstract summary: State-of-the-art 3D multi-object tracking (MOT) approaches typically rely on non-learned model-based algorithms such as Kalman Filter.
We propose 3DMOTFormer, a learned geometry-based 3D MOT framework building upon the transformer architecture.
Our approach achieves 71.2% and 68.2% AMOTA on the nuScenes validation and test split, respectively.
- Score: 15.330384668966806
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Tracking 3D objects accurately and consistently is crucial for autonomous
vehicles, enabling more reliable downstream tasks such as trajectory prediction
and motion planning. Based on the substantial progress in object detection in
recent years, the tracking-by-detection paradigm has become a popular choice
due to its simplicity and efficiency. State-of-the-art 3D multi-object tracking
(MOT) approaches typically rely on non-learned model-based algorithms such as
Kalman Filter but require many manually tuned parameters. On the other hand,
learning-based approaches face the problem of adapting the training to the
online setting, leading to inevitable distribution mismatch between training
and inference as well as suboptimal performance. In this work, we propose
3DMOTFormer, a learned geometry-based 3D MOT framework building upon the
transformer architecture. We use an Edge-Augmented Graph Transformer to reason
on the track-detection bipartite graph frame-by-frame and conduct data
association via edge classification. To reduce the distribution mismatch
between training and inference, we propose a novel online training strategy
with an autoregressive and recurrent forward pass as well as sequential batch
optimization. Using CenterPoint detections, our approach achieves 71.2% and
68.2% AMOTA on the nuScenes validation and test split, respectively. In
addition, a trained 3DMOTFormer model generalizes well across different object
detectors. Code is available at: https://github.com/dsx0511/3DMOTFormer.
Related papers
- You Only Need Two Detectors to Achieve Multi-Modal 3D Multi-Object Tracking [9.20064374262956]
The proposed framework can achieve robust tracking by using only a 2D detector and a 3D detector.
It is proven more accurate than many of the state-of-the-art TBD-based multi-modal tracking methods.
arXiv Detail & Related papers (2023-04-18T02:45:18Z) - GOOD: General Optimization-based Fusion for 3D Object Detection via
LiDAR-Camera Object Candidates [10.534984939225014]
3D object detection serves as the core basis of the perception tasks in autonomous driving.
Good is a general optimization-based fusion framework that can achieve satisfying detection without training additional models.
Experiments on both nuScenes and KITTI datasets are carried out and the results show that GOOD outperforms by 9.1% on mAP score compared with PointPillars.
arXiv Detail & Related papers (2023-03-17T07:05:04Z) - Weakly Supervised Monocular 3D Object Detection using Multi-View
Projection and Direction Consistency [78.76508318592552]
Monocular 3D object detection has become a mainstream approach in automatic driving for its easy application.
Most current methods still rely on 3D point cloud data for labeling the ground truths used in the training phase.
We propose a new weakly supervised monocular 3D objection detection method, which can train the model with only 2D labels marked on images.
arXiv Detail & Related papers (2023-03-15T15:14:00Z) - 3D Multi-Object Tracking Using Graph Neural Networks with Cross-Edge
Modality Attention [9.150245363036165]
Batch3DMOT represents real-world scenes as directed, acyclic, and category-disjoint tracking graphs.
We present a multi-modal graph neural network that uses a cross-edge attention mechanism mitigating modality intermittence.
arXiv Detail & Related papers (2022-03-21T12:44:17Z) - LocATe: End-to-end Localization of Actions in 3D with Transformers [91.28982770522329]
LocATe is an end-to-end approach that jointly localizes and recognizes actions in a 3D sequence.
Unlike transformer-based object-detection and classification models which consider image or patch features as input, LocATe's transformer model is capable of capturing long-term correlations between actions in a sequence.
We introduce a new, challenging, and more realistic benchmark dataset, BABEL-TAL-20 (BT20), where the performance of state-of-the-art methods is significantly worse.
arXiv Detail & Related papers (2022-03-21T03:35:32Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - ST3D: Self-training for Unsupervised Domain Adaptation on 3D
ObjectDetection [78.71826145162092]
We present a new domain adaptive self-training pipeline, named ST3D, for unsupervised domain adaptation on 3D object detection from point clouds.
Our ST3D achieves state-of-the-art performance on all evaluated datasets and even surpasses fully supervised results on KITTI 3D object detection benchmark.
arXiv Detail & Related papers (2021-03-09T10:51:24Z) - A two-stage data association approach for 3D Multi-object Tracking [0.0]
We adapt a two-stage dataassociation method which was successful in image-based tracking to the 3D setting.
Our method outperforms the baseline using one-stagebipartie matching for data association by achieving 0.587 AMOTA in NuScenes validation set.
arXiv Detail & Related papers (2021-01-21T15:50:17Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.