MonoSKD: General Distillation Framework for Monocular 3D Object
Detection via Spearman Correlation Coefficient
- URL: http://arxiv.org/abs/2310.11316v1
- Date: Tue, 17 Oct 2023 14:48:02 GMT
- Title: MonoSKD: General Distillation Framework for Monocular 3D Object
Detection via Spearman Correlation Coefficient
- Authors: Sen Wang, Jin Zheng
- Abstract summary: Existing monocular 3D detection knowledge distillation methods usually project the LiDAR onto the image plane and train the teacher network accordingly.
We propose MonoSKD, a novel Knowledge Distillation framework for Monocular 3D detection based on Spearman correlation coefficient.
Our framework achieves state-of-the-art performance until submission with no additional inference computational cost.
- Score: 11.48914285491747
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular 3D object detection is an inherently ill-posed problem, as it is
challenging to predict accurate 3D localization from a single image. Existing
monocular 3D detection knowledge distillation methods usually project the LiDAR
onto the image plane and train the teacher network accordingly. Transferring
LiDAR-based model knowledge to RGB-based models is more complex, so a general
distillation strategy is needed. To alleviate cross-modal prob-lem, we propose
MonoSKD, a novel Knowledge Distillation framework for Monocular 3D detection
based on Spearman correlation coefficient, to learn the relative correlation
between cross-modal features. Considering the large gap between these features,
strict alignment of features may mislead the training, so we propose a looser
Spearman loss. Furthermore, by selecting appropriate distillation locations and
removing redundant modules, our scheme saves more GPU resources and trains
faster than existing methods. Extensive experiments are performed to verify the
effectiveness of our framework on the challenging KITTI 3D object detection
benchmark. Our method achieves state-of-the-art performance until submission
with no additional inference computational cost. Our codes are available at
https://github.com/Senwang98/MonoSKD
Related papers
- MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection [42.4932760909941]
Monocular 3D object detection is an indispensable research topic in autonomous driving.
The challenges of Mono3D lie in understanding 3D scene geometry and reconstructing 3D object information from a single image.
Previous methods attempted to transfer 3D information directly from the LiDAR-based teacher to the camera-based student.
arXiv Detail & Related papers (2024-04-07T10:39:04Z) - ODM3D: Alleviating Foreground Sparsity for Semi-Supervised Monocular 3D
Object Detection [15.204935788297226]
ODM3D framework entails cross-modal knowledge distillation at various levels to inject LiDAR-domain knowledge into a monocular detector during training.
By identifying foreground sparsity as the main culprit behind existing methods' suboptimal training, we exploit the precise localisation information embedded in LiDAR points.
Our method ranks 1st in both KITTI validation and test benchmarks, significantly surpassing all existing monocular methods, supervised or semi-supervised.
arXiv Detail & Related papers (2023-10-28T07:12:09Z) - SimDistill: Simulated Multi-modal Distillation for BEV 3D Object
Detection [56.24700754048067]
Multi-view camera-based 3D object detection has become popular due to its low cost, but accurately inferring 3D geometry solely from camera data remains challenging.
We propose a Simulated multi-modal Distillation (SimDistill) method by carefully crafting the model architecture and distillation strategy.
Our SimDistill can learn better feature representations for 3D object detection while maintaining a cost-effective camera-only deployment.
arXiv Detail & Related papers (2023-03-29T16:08:59Z) - Weakly Supervised Monocular 3D Object Detection using Multi-View
Projection and Direction Consistency [78.76508318592552]
Monocular 3D object detection has become a mainstream approach in automatic driving for its easy application.
Most current methods still rely on 3D point cloud data for labeling the ground truths used in the training phase.
We propose a new weakly supervised monocular 3D objection detection method, which can train the model with only 2D labels marked on images.
arXiv Detail & Related papers (2023-03-15T15:14:00Z) - Attention-Based Depth Distillation with 3D-Aware Positional Encoding for
Monocular 3D Object Detection [10.84784828447741]
ADD is an Attention-based Depth knowledge Distillation framework with 3D-aware positional encoding.
Credit to our teacher design, our framework is seamless, domain-gap free, easily implementable, and is compatible with object-wise ground-truth depth.
We implement our framework on three representative monocular detectors, and we achieve state-of-the-art performance with no additional inference computational cost.
arXiv Detail & Related papers (2022-11-30T06:39:25Z) - Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference.
Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - MonoDistill: Learning Spatial Features for Monocular 3D Object Detection [80.74622486604886]
We propose a simple and effective scheme to introduce the spatial information from LiDAR signals to the monocular 3D detectors.
We use the resulting data to train a 3D detector with the same architecture as the baseline model.
Experimental results show that the proposed method can significantly boost the performance of the baseline model.
arXiv Detail & Related papers (2022-01-26T09:21:41Z) - Delving into Localization Errors for Monocular 3D Object Detection [85.77319416168362]
Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving.
In this work, we quantify the impact introduced by each sub-task and find the localization error' is the vital factor in restricting monocular 3D detection.
arXiv Detail & Related papers (2021-03-30T10:38:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.