BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for
BEV 3D Object Detection
- URL: http://arxiv.org/abs/2212.00623v1
- Date: Thu, 1 Dec 2022 16:17:39 GMT
- Title: BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for
BEV 3D Object Detection
- Authors: Jianing Li, Ming Lu, Jiaming Liu, Yandong Guo, Li Du, Shanghang Zhang
- Abstract summary: We propose a unified framework named BEV-LGKD to transfer the knowledge in the teacher-student manner.
Our method only uses LiDAR points to guide the KD between RGB models.
- Score: 40.45938603642747
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Bird's-Eye-View (BEV) representation has gained increasing
attention in multi-view 3D object detection, which has demonstrated promising
applications in autonomous driving. Although multi-view camera systems can be
deployed at low cost, the lack of depth information makes current approaches
adopt large models for good performance. Therefore, it is essential to improve
the efficiency of BEV 3D object detection. Knowledge Distillation (KD) is one
of the most practical techniques to train efficient yet accurate models.
However, BEV KD is still under-explored to the best of our knowledge. Different
from image classification tasks, BEV 3D object detection approaches are more
complicated and consist of several components. In this paper, we propose a
unified framework named BEV-LGKD to transfer the knowledge in the
teacher-student manner. However, directly applying the teacher-student paradigm
to BEV features fails to achieve satisfying results due to heavy background
information in RGB cameras. To solve this problem, we propose to leverage the
localization advantage of LiDAR points. Specifically, we transform the LiDAR
points to BEV space and generate the foreground mask and view-dependent mask
for the teacher-student paradigm. It is to be noted that our method only uses
LiDAR points to guide the KD between RGB models. As the quality of depth
estimation is crucial for BEV perception, we further introduce depth
distillation to our framework. Our unified framework is simple yet effective
and achieves a significant performance boost. Code will be released.
Related papers
- Instance-aware Multi-Camera 3D Object Detection with Structural Priors
Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.
We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z) - ODM3D: Alleviating Foreground Sparsity for Semi-Supervised Monocular 3D
Object Detection [15.204935788297226]
ODM3D framework entails cross-modal knowledge distillation at various levels to inject LiDAR-domain knowledge into a monocular detector during training.
By identifying foreground sparsity as the main culprit behind existing methods' suboptimal training, we exploit the precise localisation information embedded in LiDAR points.
Our method ranks 1st in both KITTI validation and test benchmarks, significantly surpassing all existing monocular methods, supervised or semi-supervised.
arXiv Detail & Related papers (2023-10-28T07:12:09Z) - DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal
Knowledge Distillation [25.933070263556374]
3D perception based on representations learned from multi-camera bird's-eye-view (BEV) is trending as cameras are cost-effective for mass production in autonomous driving industry.
There exists a distinct performance gap between multi-camera BEV and LiDAR based 3D object detection.
We propose to boost the representation learning of a multi-camera BEV based student detector by training it to imitate the features of a well-trained LiDAR based teacher detector.
arXiv Detail & Related papers (2023-09-26T17:56:21Z) - CALICO: Self-Supervised Camera-LiDAR Contrastive Pre-training for BEV
Perception [32.91233926771015]
CALICO is a novel framework that applies contrastive objectives to both LiDAR and camera backbones.
Our framework can be tailored to different backbones and heads, positioning it as a promising approach for multimodal BEV perception.
arXiv Detail & Related papers (2023-06-01T05:06:56Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - SimDistill: Simulated Multi-modal Distillation for BEV 3D Object
Detection [56.24700754048067]
Multi-view camera-based 3D object detection has become popular due to its low cost, but accurately inferring 3D geometry solely from camera data remains challenging.
We propose a Simulated multi-modal Distillation (SimDistill) method by carefully crafting the model architecture and distillation strategy.
Our SimDistill can learn better feature representations for 3D object detection while maintaining a cost-effective camera-only deployment.
arXiv Detail & Related papers (2023-03-29T16:08:59Z) - BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud
Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving.
Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation.
We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z) - BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object
Detection [17.526914782562528]
3D object detection from multiple image views is a challenging task for visual scene understanding.
We propose textbfBEVDistill, a cross-modal BEV knowledge distillation framework for multi-view 3D object detection.
Our best model achieves 59.4 NDS on the nuScenes test leaderboard, achieving new state-of-the-art in comparison with various image-based detectors.
arXiv Detail & Related papers (2022-11-17T07:26:14Z) - Structured Knowledge Distillation Towards Efficient and Compact
Multi-View 3D Detection [30.74309289544479]
We propose a structured knowledge distillation framework to improve the efficiency of vision-only BEV detection models.
Experimental results show that our method leads to an average improvement of 2.16 mAP and 2.27 NDS on the nuScenes benchmark.
arXiv Detail & Related papers (2022-11-14T12:51:17Z) - M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified
Birds-Eye View Representation [145.6041893646006]
M$2$BEV is a unified framework that jointly performs 3D object detection and map segmentation.
M$2$BEV infers both tasks with a unified model and improves efficiency.
arXiv Detail & Related papers (2022-04-11T13:43:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.