ODM3D: Alleviating Foreground Sparsity for Semi-Supervised Monocular 3D
Object Detection
- URL: http://arxiv.org/abs/2310.18620v2
- Date: Tue, 7 Nov 2023 02:55:02 GMT
- Title: ODM3D: Alleviating Foreground Sparsity for Semi-Supervised Monocular 3D
Object Detection
- Authors: Weijia Zhang, Dongnan Liu, Chao Ma, Weidong Cai
- Abstract summary: ODM3D framework entails cross-modal knowledge distillation at various levels to inject LiDAR-domain knowledge into a monocular detector during training.
By identifying foreground sparsity as the main culprit behind existing methods' suboptimal training, we exploit the precise localisation information embedded in LiDAR points.
Our method ranks 1st in both KITTI validation and test benchmarks, significantly surpassing all existing monocular methods, supervised or semi-supervised.
- Score: 15.204935788297226
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular 3D object detection (M3OD) is a significant yet inherently
challenging task in autonomous driving due to absence of explicit depth cues in
a single RGB image. In this paper, we strive to boost currently underperforming
monocular 3D object detectors by leveraging an abundance of unlabelled data via
semi-supervised learning. Our proposed ODM3D framework entails cross-modal
knowledge distillation at various levels to inject LiDAR-domain knowledge into
a monocular detector during training. By identifying foreground sparsity as the
main culprit behind existing methods' suboptimal training, we exploit the
precise localisation information embedded in LiDAR points to enable more
foreground-attentive and efficient distillation via the proposed BEV occupancy
guidance mask, leading to notably improved knowledge transfer and M3OD
performance. Besides, motivated by insights into why existing cross-modal
GT-sampling techniques fail on our task at hand, we further design a novel
cross-modal object-wise data augmentation strategy for effective RGB-LiDAR
joint learning. Our method ranks 1st in both KITTI validation and test
benchmarks, significantly surpassing all existing monocular methods, supervised
or semi-supervised, on both BEV and 3D detection metrics.
Related papers
- Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving [58.16024314532443]
We introduce LaserMix++, a framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to assist data-efficient learning.
Results demonstrate that LaserMix++ outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations.
This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems.
arXiv Detail & Related papers (2024-05-08T17:59:53Z) - Monocular 3D Object Detection with LiDAR Guided Semi Supervised Active
Learning [2.16117348324501]
We propose a novel semi-supervised active learning (SSAL) framework for monocular 3D object detection with LiDAR guidance (MonoLiG)
We utilize LiDAR to guide the data selection and training of monocular 3D detectors without introducing any overhead in the inference phase.
Our training strategy attains the top place in KITTI 3D and birds-eye-view (BEV) monocular object detection official benchmarks by improving the BEV Average Precision (AP) by 2.02.
arXiv Detail & Related papers (2023-07-17T11:55:27Z) - Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference.
Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z) - Dense Voxel Fusion for 3D Object Detection [10.717415797194896]
Voxel Fusion (DVF) is a sequential fusion method that generates multi-scale dense voxel feature representations.
We train directly with ground truth 2D bounding box labels, avoiding noisy, detector-specific, 2D predictions.
We show that our proposed multi-modal training strategy results in better generalization compared to training using erroneous 2D predictions.
arXiv Detail & Related papers (2022-03-02T04:51:31Z) - MonoDistill: Learning Spatial Features for Monocular 3D Object Detection [80.74622486604886]
We propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors.
We use the resulting data to train a 3D detector with the same architecture as the baseline model.
Experimental results show that the proposed method can significantly boost the performance of the baseline model.
arXiv Detail & Related papers (2022-01-26T09:21:41Z) - The Devil is in the Task: Exploiting Reciprocal Appearance-Localization
Features for Monocular 3D Object Detection [62.1185839286255]
Low-cost monocular 3D object detection plays a fundamental role in autonomous driving.
We introduce a Dynamic Feature Reflecting Network, named DFR-Net.
We rank 1st among all the monocular 3D object detectors in the KITTI test set.
arXiv Detail & Related papers (2021-12-28T07:31:18Z) - SGM3D: Stereo Guided Monocular 3D Object Detection [62.11858392862551]
We propose a stereo-guided monocular 3D object detection network, termed SGM3D.
We exploit robust 3D features extracted from stereo images to enhance the features learned from the monocular image.
Our method can be integrated into many other monocular approaches to boost performance without introducing any extra computational cost.
arXiv Detail & Related papers (2021-12-03T13:57:14Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.