SimDistill: Simulated Multi-modal Distillation for BEV 3D Object
Detection
- URL: http://arxiv.org/abs/2303.16818v4
- Date: Mon, 8 Jan 2024 06:00:01 GMT
- Title: SimDistill: Simulated Multi-modal Distillation for BEV 3D Object
Detection
- Authors: Haimei Zhao, Qiming Zhang, Shanshan Zhao, Zhe Chen, Jing Zhang,
Dacheng Tao
- Abstract summary: Multi-view camera-based 3D object detection has become popular due to its low cost, but accurately inferring 3D geometry solely from camera data remains challenging.
We propose a Simulated multi-modal Distillation (SimDistill) method by carefully crafting the model architecture and distillation strategy.
Our SimDistill can learn better feature representations for 3D object detection while maintaining a cost-effective camera-only deployment.
- Score: 56.24700754048067
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-view camera-based 3D object detection has become popular due to its low
cost, but accurately inferring 3D geometry solely from camera data remains
challenging and may lead to inferior performance. Although distilling precise
3D geometry knowledge from LiDAR data could help tackle this challenge, the
benefits of LiDAR information could be greatly hindered by the significant
modality gap between different sensory modalities. To address this issue, we
propose a Simulated multi-modal Distillation (SimDistill) method by carefully
crafting the model architecture and distillation strategy. Specifically, we
devise multi-modal architectures for both teacher and student models, including
a LiDAR-camera fusion-based teacher and a simulated fusion-based student. Owing
to the ``identical'' architecture design, the student can mimic the teacher to
generate multi-modal features with merely multi-view images as input, where a
geometry compensation module is introduced to bridge the modality gap.
Furthermore, we propose a comprehensive multi-modal distillation scheme that
supports intra-modal, cross-modal, and multi-modal fusion distillation
simultaneously in the Bird's-eye-view space. Incorporating them together, our
SimDistill can learn better feature representations for 3D object detection
while maintaining a cost-effective camera-only deployment. Extensive
experiments validate the effectiveness and superiority of SimDistill over
state-of-the-art methods, achieving an improvement of 4.8\% mAP and 4.1\% NDS
over the baseline detector. The source code will be released at
https://github.com/ViTAE-Transformer/SimDistill.
Related papers
- Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data [68.18735997052265]
We propose a balanced approach that combines the advantages of monocular and point cloud-based 3D detection.
Our method requires only a small number of 3D points, that can be obtained from a low-cost, low-resolution sensor.
The accuracy of 3D detection improves by 20% compared to the state-of-the-art monocular detection methods.
arXiv Detail & Related papers (2024-04-10T03:54:53Z) - MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection [42.4932760909941]
Monocular 3D object detection is an indispensable research topic in autonomous driving.
The challenges of Mono3D lie in understanding 3D scene geometry and reconstructing 3D object information from a single image.
Previous methods attempted to transfer 3D information directly from the LiDAR-based teacher to the camera-based student.
arXiv Detail & Related papers (2024-04-07T10:39:04Z) - Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - MonoSKD: General Distillation Framework for Monocular 3D Object
Detection via Spearman Correlation Coefficient [11.48914285491747]
Existing monocular 3D detection knowledge distillation methods usually project the LiDAR onto the image plane and train the teacher network accordingly.
We propose MonoSKD, a novel Knowledge Distillation framework for Monocular 3D detection based on Spearman correlation coefficient.
Our framework achieves state-of-the-art performance until submission with no additional inference computational cost.
arXiv Detail & Related papers (2023-10-17T14:48:02Z) - DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal
Knowledge Distillation [25.933070263556374]
3D perception based on representations learned from multi-camera bird's-eye-view (BEV) is trending as cameras are cost-effective for mass production in autonomous driving industry.
There exists a distinct performance gap between multi-camera BEV and LiDAR based 3D object detection.
We propose to boost the representation learning of a multi-camera BEV based student detector by training it to imitate the features of a well-trained LiDAR based teacher detector.
arXiv Detail & Related papers (2023-09-26T17:56:21Z) - UniDistill: A Universal Cross-Modality Knowledge Distillation Framework
for 3D Object Detection in Bird's-Eye View [7.1054067852590865]
We propose a universal cross-modality knowledge distillation framework (UniDistill) to improve the performance of single-modality detectors.
UniDistill easily supports LiDAR-to-camera, camera-to-LiDAR, fusion-to-LiDAR and fusion-to-camera distillation paths.
Experiments on nuScenes demonstrate that UniDistill effectively improves the mAP and NDS of student detectors by 2.0%3.2%.
arXiv Detail & Related papers (2023-03-27T10:50:58Z) - MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving [15.36416000750147]
We propose a multi-modal 3D semantic segmentation model (MSeg3D) with joint intra-modal feature extraction and inter-modal feature fusion.
MSeg3D still shows robustness and improves the LiDAR-only baseline.
arXiv Detail & Related papers (2023-03-15T13:13:03Z) - CMD: Self-supervised 3D Action Representation Learning with Cross-modal
Mutual Distillation [130.08432609780374]
In 3D action recognition, there exists rich complementary information between skeleton modalities.
We propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs.
Our approach outperforms existing self-supervised methods and sets a series of new records.
arXiv Detail & Related papers (2022-08-26T06:06:09Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - MonoDistill: Learning Spatial Features for Monocular 3D Object Detection [80.74622486604886]
We propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors.
We use the resulting data to train a 3D detector with the same architecture as the baseline model.
Experimental results show that the proposed method can significantly boost the performance of the baseline model.
arXiv Detail & Related papers (2022-01-26T09:21:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.