Delving into the Pre-training Paradigm of Monocular 3D Object Detection
- URL: http://arxiv.org/abs/2206.03657v1
- Date: Wed, 8 Jun 2022 03:01:13 GMT
- Title: Delving into the Pre-training Paradigm of Monocular 3D Object Detection
- Authors: Zhuoling Li, Chuanrui Zhang, En Yu, Haoqian Wang
- Abstract summary: We study the pre-training paradigm for monocular 3D object detection (M3OD)
We propose several strategies to further improve this baseline, which mainly include target guided semi-dense depth estimation, keypoint-aware 2D object detection, and class-level loss adjustment.
Combining all the developed techniques, the obtained pre-training framework produces pre-trained backbones that improve M3OD performance significantly on the KITTI-3D and nuScenes benchmarks.
- Score: 10.07932482761621
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The labels of monocular 3D object detection (M3OD) are expensive to obtain.
Meanwhile, there usually exists numerous unlabeled data in practical
applications, and pre-training is an efficient way of exploiting the knowledge
in unlabeled data. However, the pre-training paradigm for M3OD is hardly
studied. We aim to bridge this gap in this work. To this end, we first draw two
observations: (1) The guideline of devising pre-training tasks is imitating the
representation of the target task. (2) Combining depth estimation and 2D object
detection is a promising M3OD pre-training baseline. Afterwards, following the
guideline, we propose several strategies to further improve this baseline,
which mainly include target guided semi-dense depth estimation, keypoint-aware
2D object detection, and class-level loss adjustment. Combining all the
developed techniques, the obtained pre-training framework produces pre-trained
backbones that improve M3OD performance significantly on both the KITTI-3D and
nuScenes benchmarks. For example, by applying a DLA34 backbone to a naive
center-based M3OD detector, the moderate ${\rm AP}_{3D}70$ score of Car on the
KITTI-3D testing set is boosted by 18.71\% and the NDS score on the nuScenes
validation set is improved by 40.41\% relatively.
Related papers
- AdvMono3D: Advanced Monocular 3D Object Detection with Depth-Aware
Robust Adversarial Training [64.14759275211115]
We propose a depth-aware robust adversarial training method for monocular 3D object detection, dubbed DART3D.
Our adversarial training approach capitalizes on the inherent uncertainty, enabling the model to significantly improve its robustness against adversarial attacks.
arXiv Detail & Related papers (2023-09-03T07:05:32Z) - Weakly Supervised Monocular 3D Object Detection using Multi-View
Projection and Direction Consistency [78.76508318592552]
Monocular 3D object detection has become a mainstream approach in automatic driving for its easy application.
Most current methods still rely on 3D point cloud data for labeling the ground truths used in the training phase.
We propose a new weakly supervised monocular 3D objection detection method, which can train the model with only 2D labels marked on images.
arXiv Detail & Related papers (2023-03-15T15:14:00Z) - Introducing Depth into Transformer-based 3D Object Detection [24.224177932086455]
We present a Depth-Aware Transformer framework designed for camera-based 3D detection.
We show that DAT achieves a significant improvement of +2.8 NDS on nuScenes val under the same settings.
When using pre-trained VoVNet-99 as the backbone, DAT achieves strong results of 60.0 NDS and 51.5 mAP on nuScenes test.
arXiv Detail & Related papers (2023-02-25T06:28:32Z) - Self-Supervised 3D Monocular Object Detection by Recycling Bounding
Boxes [3.3299316770988625]
The paper studies the application of established self-supervised bounding box recycling by labeling random windows as the pretext task.
We demonstrate improvements of between 2-3 % in mAP 3D and 0.9-1.5 % BEV scores using SSL over the baseline scores.
arXiv Detail & Related papers (2022-06-25T21:48:43Z) - ST3D++: Denoised Self-training for Unsupervised Domain Adaptation on 3D
Object Detection [78.71826145162092]
We present a self-training method, named ST3D++, with a holistic pseudo label denoising pipeline for unsupervised domain adaptation on 3D object detection.
We equip the pseudo label generation process with a hybrid quality-aware triplet memory to improve the quality and stability of generated pseudo labels.
In the model training stage, we propose a source data assisted training strategy and a curriculum data augmentation policy.
arXiv Detail & Related papers (2021-08-15T07:49:06Z) - Is Pseudo-Lidar needed for Monocular 3D Object detection? [32.772699246216774]
We propose an end-to-end, single stage, monocular 3D object detector, DD3D, that can benefit from depth pre-training like pseudo-lidar methods, but without their limitations.
Our architecture is designed for effective information transfer between depth estimation and 3D detection, allowing us to scale with the amount of unlabeled pre-training data.
arXiv Detail & Related papers (2021-08-13T22:22:51Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - ST3D: Self-training for Unsupervised Domain Adaptation on 3D
ObjectDetection [78.71826145162092]
We present a new domain adaptive self-training pipeline, named ST3D, for unsupervised domain adaptation on 3D object detection from point clouds.
Our ST3D achieves state-of-the-art performance on all evaluated datasets and even surpasses fully supervised results on KITTI 3D object detection benchmark.
arXiv Detail & Related papers (2021-03-09T10:51:24Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.