Related papers: Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic Segmentation

Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic Segmentation

URL: http://arxiv.org/abs/2304.11393v1
Date: Sat, 22 Apr 2023 13:03:19 GMT
Title: Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic Segmentation
Authors: Feng Jiang, Heng Gao, Shoumeng Qiu, Haiqiang Zhang, Ru Wan and Jian Pu
Abstract summary: We develop an effective 3D-to-BEV knowledge distillation method that transfers rich knowledge from 3D voxel-based models to BEV-based models. Our framework mainly consists of two modules: the voxel-to-pillar distillation module and the label-weight distillation module. Label-weight distillation helps the model pay more attention to regions with more height information.
Score: 6.326177388323946
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LiDAR point cloud segmentation is one of the most fundamental tasks for autonomous driving scene understanding. However, it is difficult for existing models to achieve both high inference speed and accuracy simultaneously. For example, voxel-based methods perform well in accuracy, while Bird's-Eye-View (BEV)-based methods can achieve real-time inference. To overcome this issue, we develop an effective 3D-to-BEV knowledge distillation method that transfers rich knowledge from 3D voxel-based models to BEV-based models. Our framework mainly consists of two modules: the voxel-to-pillar distillation module and the label-weight distillation module. Voxel-to-pillar distillation distills sparse 3D features to BEV features for middle layers to make the BEV-based model aware of more structural and geometric information. Label-weight distillation helps the model pay more attention to regions with more height information. Finally, we conduct experiments on the SemanticKITTI dataset and Paris-Lille-3D. The results on SemanticKITTI show more than 5% improvement on the test set, especially for classes such as motorcycle and person, with more than 15% improvement. The code can be accessed at https://github.com/fengjiang5/Knowledge-Distillation-from-Cylinder3D-to-PolarNet.

Related papers

Revisiting Birds Eye View Perception Models with Frozen Foundation Models: DINOv2 and Metric3Dv2 [6.42131197643513]
We introduce an innovative application of Metric3Dv2's depth information as a PseudoLiDAR point cloud incorporated into the Simple-BEV architecture. This integration results in a +3 IoU improvement compared to the Camera-only model.
arXiv Detail & Related papers (2025-01-14T13:51:14Z)
MambaBEV: An efficient 3D detection model with Mamba2 [4.782473183865045]
We propose a mamba2-based BEV 3D object detection model named MambaBEV. We also adapt an end to end self driving paradigm to test the performance of the model.
arXiv Detail & Related papers (2024-10-16T15:37:29Z)
FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection [33.225938984092274]
We propose a Foreground Self-Distillation (FSD) scheme that effectively avoids the issue of distribution discrepancies. We also design two Point Cloud Intensification ( PCI) strategies to compensate for the sparsity of point clouds. We develop a Multi-Scale Foreground Enhancement (MSFE) module to extract and fuse multi-scale foreground features.
arXiv Detail & Related papers (2024-07-14T09:39:44Z)
Three Pillars improving Vision Foundation Model Distillation for Lidar [61.56521056618988]
We study the effect of three pillars for distillation: the 3D backbone, the pretrained 2D backbones, and the pretraining dataset. Thanks to our scalable distillation method named ScaLR, we show that scaling the 2D and 3D backbones and pretraining on diverse datasets leads to a substantial improvement of the feature quality.
arXiv Detail & Related papers (2023-10-26T15:54:43Z)
SimDistill: Simulated Multi-modal Distillation for BEV 3D Object Detection [56.24700754048067]
Multi-view camera-based 3D object detection has become popular due to its low cost, but accurately inferring 3D geometry solely from camera data remains challenging. We propose a Simulated multi-modal Distillation (SimDistill) method by carefully crafting the model architecture and distillation strategy. Our SimDistill can learn better feature representations for 3D object detection while maintaining a cost-effective camera-only deployment.
arXiv Detail & Related papers (2023-03-29T16:08:59Z)
TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning [7.6887888234987125]
We propose a learning scheme of Target Inner-Geometry from the LiDAR modality into camera-based BEV detectors. TiG-BEV can effectively boost BEVDepth by +2.3% NDS and +2.4% mAP, along with BEVDet by +9.1% NDS and +10.3% mAP on nuScenes val set.
arXiv Detail & Related papers (2022-12-28T17:53:43Z)
3D Point Cloud Pre-training with Knowledge Distillation from 2D Images [128.40422211090078]
We propose a knowledge distillation method for 3D point cloud pre-trained models to acquire knowledge directly from the 2D representation learning model. Specifically, we introduce a cross-attention mechanism to extract concept features from 3D point cloud and compare them with the semantic information from 2D images. In this scheme, the point cloud pre-trained models learn directly from rich information contained in 2D teacher models.
arXiv Detail & Related papers (2022-12-17T23:21:04Z)
BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving. Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation. We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z)
BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for BEV 3D Object Detection [40.45938603642747]
We propose a unified framework named BEV-LGKD to transfer the knowledge in the teacher-student manner. Our method only uses LiDAR points to guide the KD between RGB models.
arXiv Detail & Related papers (2022-12-01T16:17:39Z)
Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation [74.67594286008317]
This article addresses the problem of distilling knowledge from a large teacher model to a slim student network for LiDAR semantic segmentation. We propose the Point-to-Voxel Knowledge Distillation (PVD), which transfers the hidden knowledge from both point level and voxel level.
arXiv Detail & Related papers (2022-06-05T05:28:32Z)
BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation [105.96557764248846]
We introduce BEVFusion, a generic multi-task multi-sensor fusion framework. It unifies multi-modal features in the shared bird's-eye view representation space. It achieves 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower cost.
arXiv Detail & Related papers (2022-05-26T17:59:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.