Sparse4D v3: Advancing End-to-End 3D Detection and Tracking
- URL: http://arxiv.org/abs/2311.11722v1
- Date: Mon, 20 Nov 2023 12:37:58 GMT
- Title: Sparse4D v3: Advancing End-to-End 3D Detection and Tracking
- Authors: Xuewu Lin, Zixiang Pei, Tianwei Lin, Lichao Huang, Zhizhong Su
- Abstract summary: We introduce two auxiliary training tasks and propose decoupled attention to make structural improvements.
We extend the detector into a tracker using a straightforward approach that assigns instance ID during inference.
Our best model achieved 71.9% NDS and 67.7% AMOTA on the nuScenes test set.
- Score: 12.780544029261353
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In autonomous driving perception systems, 3D detection and tracking are the
two fundamental tasks. This paper delves deeper into this field, building upon
the Sparse4D framework. We introduce two auxiliary training tasks (Temporal
Instance Denoising and Quality Estimation) and propose decoupled attention to
make structural improvements, leading to significant enhancements in detection
performance. Additionally, we extend the detector into a tracker using a
straightforward approach that assigns instance ID during inference, further
highlighting the advantages of query-based algorithms. Extensive experiments
conducted on the nuScenes benchmark validate the effectiveness of the proposed
improvements. With ResNet50 as the backbone, we witnessed enhancements of
3.0\%, 2.2\%, and 7.6\% in mAP, NDS, and AMOTA, achieving 46.9\%, 56.1\%, and
49.0\%, respectively. Our best model achieved 71.9\% NDS and 67.7\% AMOTA on
the nuScenes test set. Code will be released at
\url{https://github.com/linxuewu/Sparse4D}.
Related papers
- KAN-RCBEVDepth: A multi-modal fusion algorithm in object detection for autonomous driving [2.382388777981433]
This paper introduces the KAN-RCBEVDepth method to enhance 3D object detection in autonomous driving.
Our unique Bird's Eye View-based approach significantly improves detection accuracy and efficiency.
The code will be released in urlhttps://www.laitiamo.com/laitiamo/RCBEVDepth-KAN.
arXiv Detail & Related papers (2024-08-04T16:54:49Z) - FocalFormer3D : Focusing on Hard Instance for 3D Object Detection [97.56185033488168]
False negatives (FN) in 3D object detection can lead to potentially dangerous situations in autonomous driving.
In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies textitFN in a multi-stage manner.
We instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects.
arXiv Detail & Related papers (2023-08-08T20:06:12Z) - V-DETR: DETR with Vertex Relative Position Encoding for 3D Object
Detection [73.37781484123536]
We introduce a highly performant 3D object detector for point clouds using the DETR framework.
To address the limitation, we introduce a novel 3D Relative Position (3DV-RPE) method.
We show exceptional results on the challenging ScanNetV2 benchmark.
arXiv Detail & Related papers (2023-08-08T17:14:14Z) - Window Normalization: Enhancing Point Cloud Understanding by Unifying
Inconsistent Point Densities [16.770190781915673]
Downsampling and feature extraction are essential procedures for 3D point cloud understanding.
Window-normalization method is leveraged to unify the point densities in different parts.
Group-wise strategy is proposed to obtain multi-type features, including texture and spatial information.
arXiv Detail & Related papers (2022-12-05T14:09:07Z) - Minkowski Tracker: A Sparse Spatio-Temporal R-CNN for Joint Object
Detection and Tracking [53.64390261936975]
We present Minkowski Tracker, a sparse-temporal R-CNN that jointly solves object detection and tracking problems.
Inspired by region-based CNN (R-CNN), we propose to track motion as a second stage of the object detector R-CNN.
We show in large-scale experiments that the overall performance gain of our method is due to four factors.
arXiv Detail & Related papers (2022-08-22T04:47:40Z) - Delving into the Pre-training Paradigm of Monocular 3D Object Detection [10.07932482761621]
We study the pre-training paradigm for monocular 3D object detection (M3OD)
We propose several strategies to further improve this baseline, which mainly include target guided semi-dense depth estimation, keypoint-aware 2D object detection, and class-level loss adjustment.
Combining all the developed techniques, the obtained pre-training framework produces pre-trained backbones that improve M3OD performance significantly on the KITTI-3D and nuScenes benchmarks.
arXiv Detail & Related papers (2022-06-08T03:01:13Z) - 6D Pose Estimation with Combined Deep Learning and 3D Vision Techniques
for a Fast and Accurate Object Grasping [0.19686770963118383]
Real-time robotic grasping is a priority target for highly advanced autonomous systems.
This paper proposes a novel method with a 2-stage approach that combines a fast 2D object recognition using a deep neural network.
The proposed solution has a potential to perform robustly on real-time applications, requiring both efficiency and accuracy.
arXiv Detail & Related papers (2021-11-11T15:36:55Z) - Is Pseudo-Lidar needed for Monocular 3D Object detection? [32.772699246216774]
We propose an end-to-end, single stage, monocular 3D object detector, DD3D, that can benefit from depth pre-training like pseudo-lidar methods, but without their limitations.
Our architecture is designed for effective information transfer between depth estimation and 3D detection, allowing us to scale with the amount of unlabeled pre-training data.
arXiv Detail & Related papers (2021-08-13T22:22:51Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - 3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object
Detection [76.42897462051067]
3DIoUMatch is a novel semi-supervised method for 3D object detection applicable to both indoor and outdoor scenes.
We leverage a teacher-student mutual learning framework to propagate information from the labeled to the unlabeled train set in the form of pseudo-labels.
Our method consistently improves state-of-the-art methods on both ScanNet and SUN-RGBD benchmarks by significant margins under all label ratios.
arXiv Detail & Related papers (2020-12-08T11:06:26Z) - PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation [111.7241018610573]
We present PointGroup, a new end-to-end bottom-up architecture for instance segmentation.
We design a two-branch network to extract point features and predict semantic labels and offsets, for shifting each point towards its respective instance centroid.
A clustering component is followed to utilize both the original and offset-shifted point coordinate sets, taking advantage of their complementary strength.
We conduct extensive experiments on two challenging datasets, ScanNet v2 and S3DIS, on which our method achieves the highest performance, 63.6% and 64.0%, compared to 54.9% and 54.4% achieved by former best
arXiv Detail & Related papers (2020-04-03T16:26:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.