MonoDistill: Learning Spatial Features for Monocular 3D Object Detection
- URL: http://arxiv.org/abs/2201.10830v1
- Date: Wed, 26 Jan 2022 09:21:41 GMT
- Title: MonoDistill: Learning Spatial Features for Monocular 3D Object Detection
- Authors: Zhiyu Chong, Xinzhu Ma, Hong Zhang, Yuxin Yue, Haojie Li, Zhihui Wang,
Wanli Ouyang
- Abstract summary: We propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors.
We use the resulting data to train a 3D detector with the same architecture as the baseline model.
Experimental results show that the proposed method can significantly boost the performance of the baseline model.
- Score: 80.74622486604886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detection is a fundamental and challenging task for 3D scene
understanding, and the monocular-based methods can serve as an economical
alternative to the stereo-based or LiDAR-based methods. However, accurately
detecting objects in the 3D space from a single image is extremely difficult
due to the lack of spatial cues. To mitigate this issue, we propose a simple
and effective scheme to introduce the spatial information from LiDAR signals to
the monocular 3D detectors, without introducing any extra cost in the inference
phase. In particular, we first project the LiDAR signals into the image plane
and align them with the RGB images. After that, we use the resulting data to
train a 3D detector (LiDAR Net) with the same architecture as the baseline
model. Finally, this LiDAR Net can serve as the teacher to transfer the learned
knowledge to the baseline model. Experimental results show that the proposed
method can significantly boost the performance of the baseline model and ranks
the $1^{st}$ place among all monocular-based methods on the KITTI benchmark.
Besides, extensive ablation studies are conducted, which further prove the
effectiveness of each part of our designs and illustrate what the baseline
model has learned from the LiDAR Net. Our code will be released at
\url{https://github.com/monster-ghost/MonoDistill}.
Related papers
- VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - ODM3D: Alleviating Foreground Sparsity for Semi-Supervised Monocular 3D
Object Detection [15.204935788297226]
ODM3D framework entails cross-modal knowledge distillation at various levels to inject LiDAR-domain knowledge into a monocular detector during training.
By identifying foreground sparsity as the main culprit behind existing methods' suboptimal training, we exploit the precise localisation information embedded in LiDAR points.
Our method ranks 1st in both KITTI validation and test benchmarks, significantly surpassing all existing monocular methods, supervised or semi-supervised.
arXiv Detail & Related papers (2023-10-28T07:12:09Z) - MonoSKD: General Distillation Framework for Monocular 3D Object
Detection via Spearman Correlation Coefficient [11.48914285491747]
Existing monocular 3D detection knowledge distillation methods usually project the LiDAR onto the image plane and train the teacher network accordingly.
We propose MonoSKD, a novel Knowledge Distillation framework for Monocular 3D detection based on Spearman correlation coefficient.
Our framework achieves state-of-the-art performance until submission with no additional inference computational cost.
arXiv Detail & Related papers (2023-10-17T14:48:02Z) - Weakly Supervised Monocular 3D Object Detection using Multi-View
Projection and Direction Consistency [78.76508318592552]
Monocular 3D object detection has become a mainstream approach in automatic driving for its easy application.
Most current methods still rely on 3D point cloud data for labeling the ground truths used in the training phase.
We propose a new weakly supervised monocular 3D objection detection method, which can train the model with only 2D labels marked on images.
arXiv Detail & Related papers (2023-03-15T15:14:00Z) - Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference.
Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z) - LIGA-Stereo: Learning LiDAR Geometry Aware Representations for
Stereo-based 3D Detector [80.7563981951707]
We propose LIGA-Stereo to learn stereo-based 3D detectors under the guidance of high-level geometry-aware representations of LiDAR-based detection models.
Compared with the state-of-the-art stereo detector, our method has improved the 3D detection performance of cars, pedestrians, cyclists by 10.44%, 5.69%, 5.97% mAP respectively.
arXiv Detail & Related papers (2021-08-18T17:24:40Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.