BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object
Detection
- URL: http://arxiv.org/abs/2211.09386v1
- Date: Thu, 17 Nov 2022 07:26:14 GMT
- Title: BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object
Detection
- Authors: Zehui Chen, Zhenyu Li, Shiquan Zhang, Liangji Fang, Qinhong Jiang,
Feng Zhao
- Abstract summary: 3D object detection from multiple image views is a challenging task for visual scene understanding.
We propose textbfBEVDistill, a cross-modal BEV knowledge distillation framework for multi-view 3D object detection.
Our best model achieves 59.4 NDS on the nuScenes test leaderboard, achieving new state-of-the-art in comparison with various image-based detectors.
- Score: 17.526914782562528
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D object detection from multiple image views is a fundamental and
challenging task for visual scene understanding. Owing to its low cost and high
efficiency, multi-view 3D object detection has demonstrated promising
application prospects. However, accurately detecting objects through
perspective views is extremely difficult due to the lack of depth information.
Current approaches tend to adopt heavy backbones for image encoders, making
them inapplicable for real-world deployment. Different from the images, LiDAR
points are superior in providing spatial cues, resulting in highly precise
localization. In this paper, we explore the incorporation of LiDAR-based
detectors for multi-view 3D object detection. Instead of directly training a
depth prediction network, we unify the image and LiDAR features in the
Bird-Eye-View (BEV) space and adaptively transfer knowledge across
non-homogenous representations in a teacher-student paradigm. To this end, we
propose \textbf{BEVDistill}, a cross-modal BEV knowledge distillation (KD)
framework for multi-view 3D object detection. Extensive experiments demonstrate
that the proposed method outperforms current KD approaches on a
highly-competitive baseline, BEVFormer, without introducing any extra cost in
the inference phase. Notably, our best model achieves 59.4 NDS on the nuScenes
test leaderboard, achieving new state-of-the-art in comparison with various
image-based detectors. Code will be available at
https://github.com/zehuichen123/BEVDistill.
Related papers
- SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection [19.75965521357068]
We propose a novel approach called SOGDet (Semantic-Occupancy Guided Multi-view 3D Object Detection) to improve the accuracy of 3D object detection.
Our results show that SOGDet consistently enhance the performance of three baseline methods in terms of nuScenes Detection Score (NDS) and mean Average Precision (mAP)
This indicates that the combination of 3D object detection and 3D semantic occupancy leads to a more comprehensive perception of the 3D environment, thereby aiding build more robust autonomous driving systems.
arXiv Detail & Related papers (2023-08-26T07:38:21Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling
for Multi-view 3D Understanding [42.780417042750315]
Multi-view camera-based 3D detection is a challenging problem in computer vision.
Recent works leverage a pretrained LiDAR detection model to transfer knowledge to a camera-based student network.
We propose Enhanced Geometry Masked Image Modeling (GeoMIM) to transfer the knowledge of the LiDAR model in a pretrain-finetune paradigm.
arXiv Detail & Related papers (2023-03-20T17:59:03Z) - Unleash the Potential of Image Branch for Cross-modal 3D Object
Detection [67.94357336206136]
We present a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects.
First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation.
Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch.
arXiv Detail & Related papers (2023-01-22T08:26:58Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for
BEV 3D Object Detection [40.45938603642747]
We propose a unified framework named BEV-LGKD to transfer the knowledge in the teacher-student manner.
Our method only uses LiDAR points to guide the KD between RGB models.
arXiv Detail & Related papers (2022-12-01T16:17:39Z) - Structured Knowledge Distillation Towards Efficient and Compact
Multi-View 3D Detection [30.74309289544479]
We propose a structured knowledge distillation framework to improve the efficiency of vision-only BEV detection models.
Experimental results show that our method leads to an average improvement of 2.16 mAP and 2.27 NDS on the nuScenes benchmark.
arXiv Detail & Related papers (2022-11-14T12:51:17Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - Paint and Distill: Boosting 3D Object Detection with Semantic Passing
Network [70.53093934205057]
3D object detection task from lidar or camera sensors is essential for autonomous driving.
We propose a novel semantic passing framework, named SPNet, to boost the performance of existing lidar-based 3D detection models.
arXiv Detail & Related papers (2022-07-12T12:35:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.