Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic
Segmentation
- URL: http://arxiv.org/abs/2304.11393v1
- Date: Sat, 22 Apr 2023 13:03:19 GMT
- Title: Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic
Segmentation
- Authors: Feng Jiang, Heng Gao, Shoumeng Qiu, Haiqiang Zhang, Ru Wan and Jian Pu
- Abstract summary: We develop an effective 3D-to-BEV knowledge distillation method that transfers rich knowledge from 3D voxel-based models to BEV-based models.
Our framework mainly consists of two modules: the voxel-to-pillar distillation module and the label-weight distillation module.
Label-weight distillation helps the model pay more attention to regions with more height information.
- Score: 6.326177388323946
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LiDAR point cloud segmentation is one of the most fundamental tasks for
autonomous driving scene understanding. However, it is difficult for existing
models to achieve both high inference speed and accuracy simultaneously. For
example, voxel-based methods perform well in accuracy, while Bird's-Eye-View
(BEV)-based methods can achieve real-time inference. To overcome this issue, we
develop an effective 3D-to-BEV knowledge distillation method that transfers
rich knowledge from 3D voxel-based models to BEV-based models. Our framework
mainly consists of two modules: the voxel-to-pillar distillation module and the
label-weight distillation module. Voxel-to-pillar distillation distills sparse
3D features to BEV features for middle layers to make the BEV-based model aware
of more structural and geometric information. Label-weight distillation helps
the model pay more attention to regions with more height information. Finally,
we conduct experiments on the SemanticKITTI dataset and Paris-Lille-3D. The
results on SemanticKITTI show more than 5% improvement on the test set,
especially for classes such as motorcycle and person, with more than 15%
improvement. The code can be accessed at
https://github.com/fengjiang5/Knowledge-Distillation-from-Cylinder3D-to-PolarNet.
Related papers
- Revisiting Birds Eye View Perception Models with Frozen Foundation Models: DINOv2 and Metric3Dv2 [6.42131197643513]
We introduce an innovative application of Metric3Dv2's depth information as a PseudoLiDAR point cloud incorporated into the Simple-BEV architecture.
This integration results in a +3 IoU improvement compared to the Camera-only model.
arXiv Detail & Related papers (2025-01-14T13:51:14Z) - FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection [33.225938984092274]
We propose a Foreground Self-Distillation (FSD) scheme that effectively avoids the issue of distribution discrepancies.
We also design two Point Cloud Intensification ( PCI) strategies to compensate for the sparsity of point clouds.
We develop a Multi-Scale Foreground Enhancement (MSFE) module to extract and fuse multi-scale foreground features.
arXiv Detail & Related papers (2024-07-14T09:39:44Z) - Three Pillars improving Vision Foundation Model Distillation for Lidar [61.56521056618988]
We study the effect of three pillars for distillation: the 3D backbone, the pretrained 2D backbones, and the pretraining dataset.
Thanks to our scalable distillation method named ScaLR, we show that scaling the 2D and 3D backbones and pretraining on diverse datasets leads to a substantial improvement of the feature quality.
arXiv Detail & Related papers (2023-10-26T15:54:43Z) - SimDistill: Simulated Multi-modal Distillation for BEV 3D Object
Detection [56.24700754048067]
Multi-view camera-based 3D object detection has become popular due to its low cost, but accurately inferring 3D geometry solely from camera data remains challenging.
We propose a Simulated multi-modal Distillation (SimDistill) method by carefully crafting the model architecture and distillation strategy.
Our SimDistill can learn better feature representations for 3D object detection while maintaining a cost-effective camera-only deployment.
arXiv Detail & Related papers (2023-03-29T16:08:59Z) - TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry
Learning [7.6887888234987125]
We propose a learning scheme of Target Inner-Geometry from the LiDAR modality into camera-based BEV detectors.
TiG-BEV can effectively boost BEVDepth by +2.3% NDS and +2.4% mAP, along with BEVDet by +9.1% NDS and +10.3% mAP on nuScenes val set.
arXiv Detail & Related papers (2022-12-28T17:53:43Z) - 3D Point Cloud Pre-training with Knowledge Distillation from 2D Images [128.40422211090078]
We propose a knowledge distillation method for 3D point cloud pre-trained models to acquire knowledge directly from the 2D representation learning model.
Specifically, we introduce a cross-attention mechanism to extract concept features from 3D point cloud and compare them with the semantic information from 2D images.
In this scheme, the point cloud pre-trained models learn directly from rich information contained in 2D teacher models.
arXiv Detail & Related papers (2022-12-17T23:21:04Z) - BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud
Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving.
Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation.
We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z) - BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for
BEV 3D Object Detection [40.45938603642747]
We propose a unified framework named BEV-LGKD to transfer the knowledge in the teacher-student manner.
Our method only uses LiDAR points to guide the KD between RGB models.
arXiv Detail & Related papers (2022-12-01T16:17:39Z) - Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation [74.67594286008317]
This article addresses the problem of distilling knowledge from a large teacher model to a slim student network for LiDAR semantic segmentation.
We propose the Point-to-Voxel Knowledge Distillation (PVD), which transfers the hidden knowledge from both point level and voxel level.
arXiv Detail & Related papers (2022-06-05T05:28:32Z) - BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation [105.96557764248846]
We introduce BEVFusion, a generic multi-task multi-sensor fusion framework.
It unifies multi-modal features in the shared bird's-eye view representation space.
It achieves 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower cost.
arXiv Detail & Related papers (2022-05-26T17:59:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.