IMKD: Intensity-Aware Multi-Level Knowledge Distillation for Camera-Radar Fusion
- URL: http://arxiv.org/abs/2512.15581v1
- Date: Wed, 17 Dec 2025 16:40:52 GMT
- Title: IMKD: Intensity-Aware Multi-Level Knowledge Distillation for Camera-Radar Fusion
- Authors: Shashank Mishra, Karan Patil, Didier Stricker, Jason Rambach,
- Abstract summary: High-performance Radar-Camera 3D object detection can be achieved by leveraging knowledge distillation without using LiDAR at inference time.<n>Existing distillation methods typically transfer modality-specific features directly to each sensor, which can distort their unique characteristics and degrade their individual strengths.<n>We introduce IMKD, a radar-camera fusion framework based on multi-level knowledge distillation that preserves each sensor's intrinsic characteristics while amplifying their complementary strengths.
- Score: 14.54325436617196
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-performance Radar-Camera 3D object detection can be achieved by leveraging knowledge distillation without using LiDAR at inference time. However, existing distillation methods typically transfer modality-specific features directly to each sensor, which can distort their unique characteristics and degrade their individual strengths. To address this, we introduce IMKD, a radar-camera fusion framework based on multi-level knowledge distillation that preserves each sensor's intrinsic characteristics while amplifying their complementary strengths. IMKD applies a three-stage, intensity-aware distillation strategy to enrich the fused representation across the architecture: (1) LiDAR-to-Radar intensity-aware feature distillation to enhance radar representations with fine-grained structural cues, (2) LiDAR-to-Fused feature intensity-guided distillation to selectively highlight useful geometry and depth information at the fusion level, fostering complementarity between the modalities rather than forcing them to align, and (3) Camera-Radar intensity-guided fusion mechanism that facilitates effective feature alignment and calibration. Extensive experiments on the nuScenes benchmark show that IMKD reaches 67.0% NDS and 61.0% mAP, outperforming all prior distillation-based radar-camera fusion methods. Our code and models are available at https://github.com/dfki-av/IMKD/.
Related papers
- RCTDistill: Cross-Modal Knowledge Distillation Framework for Radar-Camera 3D Object Detection with Temporal Fusion [21.686343737103627]
Radar-camera fusion methods have emerged as a cost-effective approach for 3D object detection but lag behind LiDAR-based methods in performance.<n>Recent works have focused on employing temporal fusion and Knowledge Distillation strategies to overcome these limitations.<n>We propose RCTDistill, a novel cross-modal KD method based on temporal fusion, comprising three key modules.
arXiv Detail & Related papers (2025-09-22T12:49:49Z) - LiRCDepth: Lightweight Radar-Camera Depth Estimation via Knowledge Distillation and Uncertainty Guidance [5.9796425689255255]
LiRCDepth is a lightweight radar-camera depth estimation model.<n>We incorporate knowledge distillation to enhance the training process.<n>The model achieves a 6.6% improvement in MAE on the nuScenes dataset.
arXiv Detail & Related papers (2024-12-20T22:34:47Z) - SCKD: Semi-Supervised Cross-Modality Knowledge Distillation for 4D Radar Object Detection [16.127926058992237]
We propose a novel Semi-supervised Cross-modality Knowledge Distillation (SCKD) method for 4D radar-based 3D object detection.<n>It characterizes the capability of learning the feature from a Lidar-radar-fused teacher network with semi-supervised distillation.<n>With the same network structure, our radar-only student trained by SCKD boosts the mAP by 10.38% over the baseline.
arXiv Detail & Related papers (2024-12-19T06:42:25Z) - RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features [15.686167262542297]
RadarDistill is a knowledge distillation (KD) method which can improve the representation of radar data by leveraging LiDAR data.<n>RadarDistill successfully transfers desirable characteristics of LiDAR features into radar features using three key components.<n>Our comparative analyses conducted on the nuScenes datasets demonstrate that RadarDistill achieves state-of-the-art (SOTA) performance for radar-only object detection task.
arXiv Detail & Related papers (2024-03-08T05:15:48Z) - Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection [66.74183705987276]
We introduce a framework to improve the camera-only apprentice model, including an apprentice-friendly multi-modal expert and temporal-fusion-friendly distillation supervision.
With those improvements, our camera-only apprentice VCD-A sets new state-of-the-art on nuScenes with a score of 63.1% NDS.
arXiv Detail & Related papers (2023-10-24T09:29:26Z) - ShaSTA-Fuse: Camera-LiDAR Sensor Fusion to Model Shape and
Spatio-Temporal Affinities for 3D Multi-Object Tracking [26.976216624424385]
3D multi-object tracking (MOT) is essential for an autonomous mobile agent to safely navigate a scene.
We aim to develop a 3D MOT framework that fuses camera and LiDAR sensor information.
arXiv Detail & Related papers (2023-10-04T02:17:59Z) - Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object
Detection [78.59426158981108]
We introduce a bi-directional LiDAR-Radar fusion framework, termed Bi-LRFusion, to tackle the challenges and improve 3D detection for dynamic objects.
We conduct extensive experiments on nuScenes and ORR datasets, and show that our Bi-LRFusion achieves state-of-the-art performance for detecting dynamic objects.
arXiv Detail & Related papers (2023-06-02T10:57:41Z) - MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth
Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems.
Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion.
We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z) - Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference.
Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z) - LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic
Segmentation [78.74202673902303]
We propose a coarse-tofine LiDAR and camera fusion-based network (termed as LIF-Seg) for LiDAR segmentation.
The proposed method fully utilizes the contextual information of images and introduces a simple but effective early-fusion strategy.
The cooperation of these two components leads to the success of the effective camera-LiDAR fusion.
arXiv Detail & Related papers (2021-08-17T08:53:11Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Depth Estimation from Monocular Images and Sparse Radar Data [93.70524512061318]
In this paper, we explore the possibility of achieving a more accurate depth estimation by fusing monocular images and Radar points using a deep neural network.
We find that the noise existing in Radar measurements is one of the main key reasons that prevents one from applying the existing fusion methods.
The experiments are conducted on the nuScenes dataset, which is one of the first datasets which features Camera, Radar, and LiDAR recordings in diverse scenes and weather conditions.
arXiv Detail & Related papers (2020-09-30T19:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.