Related papers: Revisiting Radar Camera Alignment by Contrastive Learning for 3D Object Detection

Revisiting Radar Camera Alignment by Contrastive Learning for 3D Object Detection

URL: http://arxiv.org/abs/2504.16368v1
Date: Wed, 23 Apr 2025 02:41:43 GMT
Title: Revisiting Radar Camera Alignment by Contrastive Learning for 3D Object Detection
Authors: Linhua Kong, Dongxia Chang, Lian Liu, Zisen Kong, Pengyuan Li, Yao Zhao,
Abstract summary: 3D object detection algorithms based on radar and camera fusion have shown excellent performance.<n>We propose a new alignment model called Radar Camera Alignment (RCAlign)<n>Specifically, we design a Dual-Route Alignment (DRA) module based on contrastive learning to align and fuse the features between radar and camera.<n>Considering the sparsity of radar BEV features, a Radar Feature Enhancement (RFE) module is proposed to improve the densification of radar BEV features.
Score: 31.69508809666884
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, 3D object detection algorithms based on radar and camera fusion have shown excellent performance, setting the stage for their application in autonomous driving perception tasks. Existing methods have focused on dealing with feature misalignment caused by the domain gap between radar and camera. However, existing methods either neglect inter-modal features interaction during alignment or fail to effectively align features at the same spatial location across modalities. To alleviate the above problems, we propose a new alignment model called Radar Camera Alignment (RCAlign). Specifically, we design a Dual-Route Alignment (DRA) module based on contrastive learning to align and fuse the features between radar and camera. Moreover, considering the sparsity of radar BEV features, a Radar Feature Enhancement (RFE) module is proposed to improve the densification of radar BEV features with the knowledge distillation loss. Experiments show RCAlign achieves a new state-of-the-art on the public nuScenes benchmark in radar camera fusion for 3D Object Detection. Furthermore, the RCAlign achieves a significant performance gain (4.3\% NDS and 8.4\% mAP) in real-time 3D detection compared to the latest state-of-the-art method (RCBEVDet).

Related papers

RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection [68.99784784185019]
Poor lighting or adverse weather conditions degrade camera performance.<n>Radar suffers from noise and positional ambiguity.<n>We propose RobuRCDet, a robust object detection model in BEV.
arXiv Detail & Related papers (2025-02-18T17:17:38Z)
TransRAD: Retentive Vision Transformer for Enhanced Radar Object Detection [6.163747364795787]
We present TransRAD, a novel 3D radar object detection model. We propose Location-Aware NMS to mitigate the common issue of duplicate bounding boxes in deep radar object detection. Results demonstrate that TransRAD outperforms state-of-the-art methods in both 2D and 3D radar detection tasks.
arXiv Detail & Related papers (2025-01-29T20:21:41Z)
HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection [10.91039672865197]
Millimeter-wave radar plays a vital role in 3D object detection for autonomous driving. Radar point clouds suffer from pronounced sparsity and unavoidable angle estimation errors. Direct fusion of radar and camera data can lead to negative or even opposite effects.
arXiv Detail & Related papers (2024-12-16T07:06:17Z)
RCBEVDet++: Toward High-accuracy Radar-Camera Fusion 3D Perception Network [34.45694077040797]
We present a radar-camera fusion 3D object detection framework called BEEVDet. RadarBEVNet encodes sparse radar points into a dense bird's-eye-view feature. Our method achieves state-of-the-art radar-camera fusion results in 3D object detection, BEV semantic segmentation, and 3D multi-object tracking tasks.
arXiv Detail & Related papers (2024-09-08T05:14:27Z)
RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar [15.776076554141687]
3D occupancy-based perception pipeline has significantly advanced autonomous driving. Current methods rely on LiDAR or camera inputs for 3D occupancy prediction. We introduce a novel approach that utilizes 4D imaging radar sensors for 3D occupancy prediction.
arXiv Detail & Related papers (2024-05-22T21:48:17Z)
Radar Fields: Frequency-Space Neural Scene Representations for FMCW Radar [62.51065633674272]
We introduce Radar Fields - a neural scene reconstruction method designed for active radar imagers. Our approach unites an explicit, physics-informed sensor model with an implicit neural geometry and reflectance model to directly synthesize raw radar measurements. We validate the effectiveness of the method across diverse outdoor scenarios, including urban scenes with dense vehicles and infrastructure.
arXiv Detail & Related papers (2024-05-07T20:44:48Z)
Better Monocular 3D Detectors with LiDAR from the Past [64.6759926054061]
Camera-based 3D detectors often suffer inferior performance compared to LiDAR-based counterparts due to inherent depth ambiguities in images. In this work, we seek to improve monocular 3D detectors by leveraging unlabeled historical LiDAR data. We show consistent and significant performance gain across multiple state-of-the-art models and datasets with a negligible additional latency of 9.66 ms and a small storage cost.
arXiv Detail & Related papers (2024-04-08T01:38:43Z)
RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection [33.07575082922186]
Three-dimensional object detection is one of the key tasks in autonomous driving. relying solely on cameras is difficult to achieve highly accurate and robust 3D object detection. radar-camera fusion 3D object detection method in the bird's eye view (BEV) RadarBEVNet consists of a dual-stream radar backbone and a Radar Cross-Section (RC) aware BEV encoder.
arXiv Detail & Related papers (2024-03-25T06:02:05Z)
Echoes Beyond Points: Unleashing the Power of Raw Radar Data in Multi-modality Fusion [74.84019379368807]
We propose a novel method named EchoFusion to skip the existing radar signal processing pipeline. Specifically, we first generate the Bird's Eye View (BEV) queries and then take corresponding spectrum features from radar to fuse with other sensors.
arXiv Detail & Related papers (2023-07-31T09:53:50Z)
Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection [78.59426158981108]
We introduce a bi-directional LiDAR-Radar fusion framework, termed Bi-LRFusion, to tackle the challenges and improve 3D detection for dynamic objects. We conduct extensive experiments on nuScenes and ORR datasets, and show that our Bi-LRFusion achieves state-of-the-art performance for detecting dynamic objects.
arXiv Detail & Related papers (2023-06-02T10:57:41Z)
Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images [96.66271207089096]
FCOS-LiDAR is a fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes. We show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors.
arXiv Detail & Related papers (2022-05-27T05:42:16Z)
RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects [73.80316195652493]
We tackle the problem of exploiting Radar for perception in the context of self-driving cars. We propose a new solution that exploits both LiDAR and Radar sensors for perception. Our approach, dubbed RadarNet, features a voxel-based early fusion and an attention-based late fusion.
arXiv Detail & Related papers (2020-07-28T17:15:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.