TiGDistill-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning Distillation
- URL: http://arxiv.org/abs/2412.20911v1
- Date: Mon, 30 Dec 2024 12:44:20 GMT
- Title: TiGDistill-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning Distillation
- Authors: Shaoqing Xu, Fang Li, Peixiang Huang, Ziying Song, Zhi-Xin Yang,
- Abstract summary: TiGDistill-BEV is a novel approach to bridge the gap between LiDAR and camera data representations.
Our method distills knowledge from diverse modalities as the teacher model to a camera-based student detector.
Experiments on the nuScenes benchmark demonstrate that TiGDistill-BEV significantly boosts camera-based only detectors achieving a state-of-the-art with 62.8% NDS.
- Score: 6.34398347558641
- License:
- Abstract: Accurate multi-view 3D object detection is essential for applications such as autonomous driving. Researchers have consistently aimed to leverage LiDAR's precise spatial information to enhance camera-based detectors through methods like depth supervision and bird-eye-view (BEV) feature distillation. However, existing approaches often face challenges due to the inherent differences between LiDAR and camera data representations. In this paper, we introduce the TiGDistill-BEV, a novel approach that effectively bridges this gap by leveraging the strengths of both sensors. Our method distills knowledge from diverse modalities(e.g., LiDAR) as the teacher model to a camera-based student detector, utilizing the Target Inner-Geometry learning scheme to enhance camera-based BEV detectors through both depth and BEV features by leveraging diverse modalities. Specially, we propose two key modules: an inner-depth supervision module to learn the low-level relative depth relations within objects which equips detectors with a deeper understanding of object-level spatial structures, and an inner-feature BEV distillation module to transfer high-level semantics of different key points within foreground targets. To further alleviate the domain gap, we incorporate both inter-channel and inter-keypoint distillation to model feature similarity. Extensive experiments on the nuScenes benchmark demonstrate that TiGDistill-BEV significantly boosts camera-based only detectors achieving a state-of-the-art with 62.8% NDS and surpassing previous methods by a significant margin. The codes is available at: https://github.com/Public-BOTs/TiGDistill-BEV.git.
Related papers
- Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - VirtualPainting: Addressing Sparsity with Virtual Points and
Distance-Aware Data Augmentation for 3D Object Detection [3.5259183508202976]
We present an innovative approach that involves the generation of virtual LiDAR points using camera images.
We also enhance these virtual points with semantic labels obtained from image-based segmentation networks.
Our approach offers a versatile solution that can be seamlessly integrated into various 3D frameworks and 2D semantic segmentation methods.
arXiv Detail & Related papers (2023-12-26T18:03:05Z) - Instance-aware Multi-Camera 3D Object Detection with Structural Priors
Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.
We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z) - DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal
Knowledge Distillation [25.933070263556374]
3D perception based on representations learned from multi-camera bird's-eye-view (BEV) is trending as cameras are cost-effective for mass production in autonomous driving industry.
There exists a distinct performance gap between multi-camera BEV and LiDAR based 3D object detection.
We propose to boost the representation learning of a multi-camera BEV based student detector by training it to imitate the features of a well-trained LiDAR based teacher detector.
arXiv Detail & Related papers (2023-09-26T17:56:21Z) - SimDistill: Simulated Multi-modal Distillation for BEV 3D Object
Detection [56.24700754048067]
Multi-view camera-based 3D object detection has become popular due to its low cost, but accurately inferring 3D geometry solely from camera data remains challenging.
We propose a Simulated multi-modal Distillation (SimDistill) method by carefully crafting the model architecture and distillation strategy.
Our SimDistill can learn better feature representations for 3D object detection while maintaining a cost-effective camera-only deployment.
arXiv Detail & Related papers (2023-03-29T16:08:59Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry
Learning [7.6887888234987125]
We propose a learning scheme of Target Inner-Geometry from the LiDAR modality into camera-based BEV detectors.
TiG-BEV can effectively boost BEVDepth by +2.3% NDS and +2.4% mAP, along with BEVDet by +9.1% NDS and +10.3% mAP on nuScenes val set.
arXiv Detail & Related papers (2022-12-28T17:53:43Z) - BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for
BEV 3D Object Detection [40.45938603642747]
We propose a unified framework named BEV-LGKD to transfer the knowledge in the teacher-student manner.
Our method only uses LiDAR points to guide the KD between RGB models.
arXiv Detail & Related papers (2022-12-01T16:17:39Z) - BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object
Detection [17.526914782562528]
3D object detection from multiple image views is a challenging task for visual scene understanding.
We propose textbfBEVDistill, a cross-modal BEV knowledge distillation framework for multi-view 3D object detection.
Our best model achieves 59.4 NDS on the nuScenes test leaderboard, achieving new state-of-the-art in comparison with various image-based detectors.
arXiv Detail & Related papers (2022-11-17T07:26:14Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference.
Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.