RCTDistill: Cross-Modal Knowledge Distillation Framework for Radar-Camera 3D Object Detection with Temporal Fusion
- URL: http://arxiv.org/abs/2509.17712v1
- Date: Mon, 22 Sep 2025 12:49:49 GMT
- Title: RCTDistill: Cross-Modal Knowledge Distillation Framework for Radar-Camera 3D Object Detection with Temporal Fusion
- Authors: Geonho Bang, Minjae Seong, Jisong Kim, Geunju Baek, Daye Oh, Junhyung Kim, Junho Koh, Jun Won Choi,
- Abstract summary: Radar-camera fusion methods have emerged as a cost-effective approach for 3D object detection but lag behind LiDAR-based methods in performance.<n>Recent works have focused on employing temporal fusion and Knowledge Distillation strategies to overcome these limitations.<n>We propose RCTDistill, a novel cross-modal KD method based on temporal fusion, comprising three key modules.
- Score: 21.686343737103627
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Radar-camera fusion methods have emerged as a cost-effective approach for 3D object detection but still lag behind LiDAR-based methods in performance. Recent works have focused on employing temporal fusion and Knowledge Distillation (KD) strategies to overcome these limitations. However, existing approaches have not sufficiently accounted for uncertainties arising from object motion or sensor-specific errors inherent in radar and camera modalities. In this work, we propose RCTDistill, a novel cross-modal KD method based on temporal fusion, comprising three key modules: Range-Azimuth Knowledge Distillation (RAKD), Temporal Knowledge Distillation (TKD), and Region-Decoupled Knowledge Distillation (RDKD). RAKD is designed to consider the inherent errors in the range and azimuth directions, enabling effective knowledge transfer from LiDAR features to refine inaccurate BEV representations. TKD mitigates temporal misalignment caused by dynamic objects by aligning historical radar-camera BEV features with current LiDAR representations. RDKD enhances feature discrimination by distilling relational knowledge from the teacher model, allowing the student to differentiate foreground and background features. RCTDistill achieves state-of-the-art radar-camera fusion performance on both the nuScenes and View-of-Delft (VoD) datasets, with the fastest inference speed of 26.2 FPS.
Related papers
- IMKD: Intensity-Aware Multi-Level Knowledge Distillation for Camera-Radar Fusion [14.54325436617196]
High-performance Radar-Camera 3D object detection can be achieved by leveraging knowledge distillation without using LiDAR at inference time.<n>Existing distillation methods typically transfer modality-specific features directly to each sensor, which can distort their unique characteristics and degrade their individual strengths.<n>We introduce IMKD, a radar-camera fusion framework based on multi-level knowledge distillation that preserves each sensor's intrinsic characteristics while amplifying their complementary strengths.
arXiv Detail & Related papers (2025-12-17T16:40:52Z) - RaLiFlow: Scene Flow Estimation with 4D Radar and LiDAR Point Clouds [10.906975408529895]
We build a Radar-LiDAR scene flow dataset based on a public real-world automotive dataset.<n>We introduce RaLiFlow, the first joint scene flow learning framework for 4D radar and LiDAR.<n>Our method outperforms existing LiDAR-based and radar-based single-modal methods by a significant margin.
arXiv Detail & Related papers (2025-12-11T07:41:33Z) - LeAD-M3D: Leveraging Asymmetric Distillation for Real-time Monocular 3D Detection [72.97402509843484]
LeAD-M3D is a state-of-the-art monocular 3D detector that achieves state-of-the-art accuracy and real-time inference without extra modalities.<n>Asymmetric Augmentation Denoising Distillation (A2D2) transfers geometric knowledge from a clean-image teacher to a mixup-noised student.<n>3D-aware Consistent Matching (CM3D) improves prediction-to-ground truth assignment.<n> Confidence-Gated 3D Inference (CGI3D) accelerates detection by restricting expensive 3D regression to top-confidence regions.
arXiv Detail & Related papers (2025-12-05T12:08:18Z) - Revisiting Radar Camera Alignment by Contrastive Learning for 3D Object Detection [31.69508809666884]
3D object detection algorithms based on radar and camera fusion have shown excellent performance.<n>We propose a new alignment model called Radar Camera Alignment (RCAlign)<n>Specifically, we design a Dual-Route Alignment (DRA) module based on contrastive learning to align and fuse the features between radar and camera.<n>Considering the sparsity of radar BEV features, a Radar Feature Enhancement (RFE) module is proposed to improve the densification of radar BEV features.
arXiv Detail & Related papers (2025-04-23T02:41:43Z) - SCKD: Semi-Supervised Cross-Modality Knowledge Distillation for 4D Radar Object Detection [16.127926058992237]
We propose a novel Semi-supervised Cross-modality Knowledge Distillation (SCKD) method for 4D radar-based 3D object detection.<n>It characterizes the capability of learning the feature from a Lidar-radar-fused teacher network with semi-supervised distillation.<n>With the same network structure, our radar-only student trained by SCKD boosts the mAP by 10.38% over the baseline.
arXiv Detail & Related papers (2024-12-19T06:42:25Z) - V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection [64.93675471780209]
We present V2X-R, the first simulated V2X dataset incorporating LiDAR, camera, and 4D radar.<n>V2X-R contains 12,079 scenarios with 37,727 frames of LiDAR and 4D radar point clouds, 150,908 images, and 170,859 annotated 3D vehicle bounding boxes.<n>We propose a novel cooperative LiDAR-4D radar fusion pipeline for 3D object detection and implement it with various fusion strategies.
arXiv Detail & Related papers (2024-11-13T07:41:47Z) - Better Monocular 3D Detectors with LiDAR from the Past [64.6759926054061]
Camera-based 3D detectors often suffer inferior performance compared to LiDAR-based counterparts due to inherent depth ambiguities in images.
In this work, we seek to improve monocular 3D detectors by leveraging unlabeled historical LiDAR data.
We show consistent and significant performance gain across multiple state-of-the-art models and datasets with a negligible additional latency of 9.66 ms and a small storage cost.
arXiv Detail & Related papers (2024-04-08T01:38:43Z) - RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features [15.686167262542297]
RadarDistill is a knowledge distillation (KD) method which can improve the representation of radar data by leveraging LiDAR data.<n>RadarDistill successfully transfers desirable characteristics of LiDAR features into radar features using three key components.<n>Our comparative analyses conducted on the nuScenes datasets demonstrate that RadarDistill achieves state-of-the-art (SOTA) performance for radar-only object detection task.
arXiv Detail & Related papers (2024-03-08T05:15:48Z) - Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object
Detection [78.59426158981108]
We introduce a bi-directional LiDAR-Radar fusion framework, termed Bi-LRFusion, to tackle the challenges and improve 3D detection for dynamic objects.
We conduct extensive experiments on nuScenes and ORR datasets, and show that our Bi-LRFusion achieves state-of-the-art performance for detecting dynamic objects.
arXiv Detail & Related papers (2023-06-02T10:57:41Z) - Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference.
Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z) - LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object
Detection [96.63947479020631]
In many real-world applications, the LiDAR points used by mass-produced robots and vehicles usually have fewer beams than that in large-scale public datasets.
We propose the LiDAR Distillation to bridge the domain gap induced by different LiDAR beams for 3D object detection.
arXiv Detail & Related papers (2022-03-28T17:59:02Z) - SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural
Networks [81.64530401885476]
We propose a self-supervised LiDAR odometry method, dubbed SelfVoxeLO, to tackle these two difficulties.
Specifically, we propose a 3D convolution network to process the raw LiDAR data directly, which extracts features that better encode the 3D geometric patterns.
We evaluate our method's performances on two large-scale datasets, i.e., KITTI and Apollo-SouthBay.
arXiv Detail & Related papers (2020-10-19T09:23:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.