LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using
Online Camera Distillation
- URL: http://arxiv.org/abs/2304.11379v2
- Date: Mon, 5 Jun 2023 03:56:19 GMT
- Title: LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using
Online Camera Distillation
- Authors: Song Wang and Wentong Li and Wenyu Liu and Xiaolu Liu and Jianke Zhu
- Abstract summary: Semantic map construction under bird's-eye view (BEV) plays an essential role in autonomous driving.
In this paper, we propose an effective LiDAR-based method to build semantic map.
We introduce a BEV feature pyramid decoder that learns the robust multi-scale BEV features for semantic map construction.
- Score: 21.53150795218778
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic map construction under bird's-eye view (BEV) plays an essential role
in autonomous driving. In contrast to camera image, LiDAR provides the accurate
3D observations to project the captured 3D features onto BEV space inherently.
However, the vanilla LiDAR-based BEV feature often contains many indefinite
noises, where the spatial features have little texture and semantic cues. In
this paper, we propose an effective LiDAR-based method to build semantic map.
Specifically, we introduce a BEV feature pyramid decoder that learns the robust
multi-scale BEV features for semantic map construction, which greatly boosts
the accuracy of the LiDAR-based method. To mitigate the defects caused by
lacking semantic cues in LiDAR data, we present an online Camera-to-LiDAR
distillation scheme to facilitate the semantic learning from image to point
cloud. Our distillation scheme consists of feature-level and logit-level
distillation to absorb the semantic information from camera in BEV. The
experimental results on challenging nuScenes dataset demonstrate the efficacy
of our proposed LiDAR2Map on semantic map construction, which significantly
outperforms the previous LiDAR-based methods over 27.9% mIoU and even performs
better than the state-of-the-art camera-based approaches. Source code is
available at: https://github.com/songw-zju/LiDAR2Map.
Related papers
- Shelf-Supervised Multi-Modal Pre-Training for 3D Object Detection [52.66283064389691]
We propose a shelf-supervised approach for generating zero-shot 3D bounding boxes from paired RGB and LiDAR data.
We show that image-based shelf-supervision is helpful for training LiDAR-only and multi-modal (RGB + LiDAR) detectors.
arXiv Detail & Related papers (2024-06-14T15:21:57Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud
Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving.
Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation.
We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z) - BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for
BEV 3D Object Detection [40.45938603642747]
We propose a unified framework named BEV-LGKD to transfer the knowledge in the teacher-student manner.
Our method only uses LiDAR points to guide the KD between RGB models.
arXiv Detail & Related papers (2022-12-01T16:17:39Z) - BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object
Detection [17.526914782562528]
3D object detection from multiple image views is a challenging task for visual scene understanding.
We propose textbfBEVDistill, a cross-modal BEV knowledge distillation framework for multi-view 3D object detection.
Our best model achieves 59.4 NDS on the nuScenes test leaderboard, achieving new state-of-the-art in comparison with various image-based detectors.
arXiv Detail & Related papers (2022-11-17T07:26:14Z) - Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving
Object Segmentation [23.666607237164186]
We propose a novel deep neural network exploiting both spatial-temporal information and different representation modalities of LiDAR scans to improve LiDAR-MOS performance.
Specifically, we first use a range image-based dual-branch structure to separately deal with spatial and temporal information.
We also use a point refinement module via 3D sparse convolution to fuse the information from both LiDAR range image and point cloud representations.
arXiv Detail & Related papers (2022-07-05T17:59:17Z) - Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference.
Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z) - A Simple Baseline for BEV Perception Without LiDAR [37.00868568802673]
Building 3D perception systems for autonomous vehicles that do not rely on LiDAR is a critical research problem.
Current methods use multi-view RGB data collected from cameras around the vehicle.
We propose a simple baseline model, where the "lifting" step simply averages features from all projected image locations.
arXiv Detail & Related papers (2022-06-16T06:57:32Z) - LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object
Detection [96.63947479020631]
In many real-world applications, the LiDAR points used by mass-produced robots and vehicles usually have fewer beams than that in large-scale public datasets.
We propose the LiDAR Distillation to bridge the domain gap induced by different LiDAR beams for 3D object detection.
arXiv Detail & Related papers (2022-03-28T17:59:02Z) - MonoDistill: Learning Spatial Features for Monocular 3D Object Detection [80.74622486604886]
We propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors.
We use the resulting data to train a 3D detector with the same architecture as the baseline model.
Experimental results show that the proposed method can significantly boost the performance of the baseline model.
arXiv Detail & Related papers (2022-01-26T09:21:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.