Related papers: CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation

CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation

URL: http://arxiv.org/abs/2403.19104v1
Date: Thu, 28 Mar 2024 02:39:45 GMT
Title: CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation
Authors: Lingjun Zhao, Jingyu Song, Katherine A. Skinner,
Abstract summary: We propose Camera-Radar Knowledge Distillation (CRKD) to bridge the performance gap between LC and CR detectors with a novel cross-modality KD framework. To accommodate the unique cross-modality KD path, we propose four distillation losses to help the student learn crucial features from the teacher model. We present extensive evaluations on the nuScenes dataset to demonstrate the effectiveness of the proposed CRKD framework.
Score: 6.678224763527922
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the field of 3D object detection for autonomous driving, LiDAR-Camera (LC) fusion is the top-performing sensor configuration. Still, LiDAR is relatively high cost, which hinders adoption of this technology for consumer automobiles. Alternatively, camera and radar are commonly deployed on vehicles already on the road today, but performance of Camera-Radar (CR) fusion falls behind LC fusion. In this work, we propose Camera-Radar Knowledge Distillation (CRKD) to bridge the performance gap between LC and CR detectors with a novel cross-modality KD framework. We use the Bird's-Eye-View (BEV) representation as the shared feature space to enable effective knowledge distillation. To accommodate the unique cross-modality KD path, we propose four distillation losses to help the student learn crucial features from the teacher model. We present extensive evaluations on the nuScenes dataset to demonstrate the effectiveness of the proposed CRKD framework. The project page for CRKD is https://song-jingyu.github.io/CRKD.

Related papers

RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection [68.99784784185019]
Poor lighting or adverse weather conditions degrade camera performance. Radar suffers from noise and positional ambiguity. We propose RobuRCDet, a robust object detection model in BEV.
arXiv Detail & Related papers (2025-02-18T17:17:38Z)
SCKD: Semi-Supervised Cross-Modality Knowledge Distillation for 4D Radar Object Detection [16.127926058992237]
We propose a novel Semi-supervised Cross-modality Knowledge Distillation (SCKD) method for 4D radar-based 3D object detection. It characterizes the capability of learning the feature from a Lidar-radar-fused teacher network with semi-supervised distillation. With the same network structure, our radar-only student trained by SCKD boosts the mAP by 10.38% over the baseline.
arXiv Detail & Related papers (2024-12-19T06:42:25Z)
LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and Camera [22.974481709303927]
3D radar is gradually replacing LiDAR in autonomous driving applications. We propose a three-stage tight fusion approach on BEV to realize a fusion framework for point clouds and images. Our approach enhances the performance in both radar-only (R-LiCROcc) and radar-camera (RC-LiCROcc) settings.
arXiv Detail & Related papers (2024-07-23T05:53:05Z)
Better Monocular 3D Detectors with LiDAR from the Past [64.6759926054061]
Camera-based 3D detectors often suffer inferior performance compared to LiDAR-based counterparts due to inherent depth ambiguities in images. In this work, we seek to improve monocular 3D detectors by leveraging unlabeled historical LiDAR data. We show consistent and significant performance gain across multiple state-of-the-art models and datasets with a negligible additional latency of 9.66 ms and a small storage cost.
arXiv Detail & Related papers (2024-04-08T01:38:43Z)
CR3DT: Camera-RADAR Fusion for 3D Detection and Tracking [40.630532348405595]
Camera-RADAR 3D Detection and Tracking (CR3DT) is a camera-RADAR fusion model for 3D object detection, and Multi-Object Tracking (MOT) Building upon the foundations of the State-of-the-Art (SotA) camera-only BEVDet architecture, CR3DT demonstrates substantial improvements in both detection and tracking capabilities.
arXiv Detail & Related papers (2024-03-22T16:06:05Z)
Echoes Beyond Points: Unleashing the Power of Raw Radar Data in Multi-modality Fusion [74.84019379368807]
We propose a novel method named EchoFusion to skip the existing radar signal processing pipeline. Specifically, we first generate the Bird's Eye View (BEV) queries and then take corresponding spectrum features from radar to fuse with other sensors.
arXiv Detail & Related papers (2023-07-31T09:53:50Z)
TransCAR: Transformer-based Camera-And-Radar Fusion for 3D Object Detection [13.986963122264633]
TransCAR is a Transformer-based Camera-And-Radar fusion solution for 3D object detection. Our model estimates a bounding box per query using set-to-set Hungarian loss.
arXiv Detail & Related papers (2023-04-30T05:35:03Z)
CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for Robust 3D Object Detection [12.557361522985898]
We propose a camera-radar matching network CramNet to fuse the sensor readings from camera and radar in a joint 3D space. Our method supports training with sensor modality dropout, which leads to robust 3D object detection, even when a camera or radar sensor suddenly malfunctions on a vehicle.
arXiv Detail & Related papers (2022-10-17T17:18:47Z)
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR. fusing these two modalities can significantly boost the performance of 3D perception models. We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z)
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [49.689566246504356]
We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions. TransFusion achieves state-of-the-art performance on large-scale datasets. We extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking.
arXiv Detail & Related papers (2022-03-22T07:15:13Z)
LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic Segmentation [78.74202673902303]
We propose a coarse-tofine LiDAR and camera fusion-based network (termed as LIF-Seg) for LiDAR segmentation. The proposed method fully utilizes the contextual information of images and introduces a simple but effective early-fusion strategy. The cooperation of these two components leads to the success of the effective camera-LiDAR fusion.
arXiv Detail & Related papers (2021-08-17T08:53:11Z)
Radar Camera Fusion via Representation Learning in Autonomous Driving [4.278336455989584]
Key to successful radar-camera fusion is accurate data association. Traditional rule-based association methods are susceptible to performance degradation in challenging scenarios and failure in corner cases. We propose to address rad-cam association via deep representation learning, to explore feature-level interaction and global reasoning.
arXiv Detail & Related papers (2021-03-14T01:32:03Z)
Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning [59.19469551774703]
Drone-based vehicle detection aims at finding the vehicle locations and categories in an aerial image. We construct a large-scale drone-based RGB-Infrared vehicle detection dataset, termed DroneVehicle. Our DroneVehicle collects 28, 439 RGB-Infrared image pairs, covering urban roads, residential areas, parking lots, and other scenarios from day to night.
arXiv Detail & Related papers (2020-03-05T05:29:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.