Related papers: RCBEVDet++: Toward High-accuracy Radar-Camera Fusion 3D Perception Network

RCBEVDet++: Toward High-accuracy Radar-Camera Fusion 3D Perception Network

URL: http://arxiv.org/abs/2409.04979v1
Date: Sun, 8 Sep 2024 05:14:27 GMT
Title: RCBEVDet++: Toward High-accuracy Radar-Camera Fusion 3D Perception Network
Authors: Zhiwei Lin, Zhe Liu, Yongtao Wang, Le Zhang, Ce Zhu,
Abstract summary: We present a radar-camera fusion 3D object detection framework called BEEVDet. RadarBEVNet encodes sparse radar points into a dense bird's-eye-view feature. Our method achieves state-of-the-art radar-camera fusion results in 3D object detection, BEV semantic segmentation, and 3D multi-object tracking tasks.
Score: 34.45694077040797
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Perceiving the surrounding environment is a fundamental task in autonomous driving. To obtain highly accurate perception results, modern autonomous driving systems typically employ multi-modal sensors to collect comprehensive environmental data. Among these, the radar-camera multi-modal perception system is especially favored for its excellent sensing capabilities and cost-effectiveness. However, the substantial modality differences between radar and camera sensors pose challenges in fusing information. To address this problem, this paper presents RCBEVDet, a radar-camera fusion 3D object detection framework. Specifically, RCBEVDet is developed from an existing camera-based 3D object detector, supplemented by a specially designed radar feature extractor, RadarBEVNet, and a Cross-Attention Multi-layer Fusion (CAMF) module. Firstly, RadarBEVNet encodes sparse radar points into a dense bird's-eye-view (BEV) feature using a dual-stream radar backbone and a Radar Cross Section aware BEV encoder. Secondly, the CAMF module utilizes a deformable attention mechanism to align radar and camera BEV features and adopts channel and spatial fusion layers to fuse them. To further enhance RCBEVDet's capabilities, we introduce RCBEVDet++, which advances the CAMF through sparse fusion, supports query-based multi-view camera perception models, and adapts to a broader range of perception tasks. Extensive experiments on the nuScenes show that our method integrates seamlessly with existing camera-based 3D perception models and improves their performance across various perception tasks. Furthermore, our method achieves state-of-the-art radar-camera fusion results in 3D object detection, BEV semantic segmentation, and 3D multi-object tracking tasks. Notably, with ViT-L as the image backbone, RCBEVDet++ achieves 72.73 NDS and 67.34 mAP in 3D object detection without test-time augmentation or model ensembling.

Related papers

Revisiting Radar Camera Alignment by Contrastive Learning for 3D Object Detection [31.69508809666884]
3D object detection algorithms based on radar and camera fusion have shown excellent performance. We propose a new alignment model called Radar Camera Alignment (RCAlign) Specifically, we design a Dual-Route Alignment (DRA) module based on contrastive learning to align and fuse the features between radar and camera. Considering the sparsity of radar BEV features, a Radar Feature Enhancement (RFE) module is proposed to improve the densification of radar BEV features.
arXiv Detail & Related papers (2025-04-23T02:41:43Z)
RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection [68.99784784185019]
Poor lighting or adverse weather conditions degrade camera performance. Radar suffers from noise and positional ambiguity. We propose RobuRCDet, a robust object detection model in BEV.
arXiv Detail & Related papers (2025-02-18T17:17:38Z)
RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection [33.07575082922186]
Three-dimensional object detection is one of the key tasks in autonomous driving. relying solely on cameras is difficult to achieve highly accurate and robust 3D object detection. radar-camera fusion 3D object detection method in the bird's eye view (BEV) RadarBEVNet consists of a dual-stream radar backbone and a Radar Cross-Section (RC) aware BEV encoder.
arXiv Detail & Related papers (2024-03-25T06:02:05Z)
CenterRadarNet: Joint 3D Object Detection and Tracking Framework using 4D FMCW Radar [28.640714690346353]
CenterRadarNet is designed to facilitate high-resolution representation learning from 4D (Doppler-range-azimuth-ele) radar data. As a single-stage 3D object detector, CenterRadarNet infers the BEV object distribution confidence maps, corresponding 3D bounding box attributes, and appearance embedding for each pixel. In diverse driving scenarios, CenterRadarNet shows consistent, robust performance, emphasizing its wide applicability.
arXiv Detail & Related papers (2023-11-02T17:36:40Z)
RCM-Fusion: Radar-Camera Multi-Level Fusion for 3D Object Detection [15.686167262542297]
We propose Radar-Camera Multi-level fusion (RCM-Fusion), which attempts to fuse both modalities at both feature and instance levels. For feature-level fusion, we propose a Radar Guided BEV which transforms camera features into precise BEV representations. For instance-level fusion, we propose a Radar Grid Point Refinement module that reduces localization error.
arXiv Detail & Related papers (2023-07-17T07:22:25Z)
Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection [78.59426158981108]
We introduce a bi-directional LiDAR-Radar fusion framework, termed Bi-LRFusion, to tackle the challenges and improve 3D detection for dynamic objects. We conduct extensive experiments on nuScenes and ORR datasets, and show that our Bi-LRFusion achieves state-of-the-art performance for detecting dynamic objects.
arXiv Detail & Related papers (2023-06-02T10:57:41Z)
Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection. With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z)
MVFusion: Multi-View 3D Object Detection with Semantic-aligned Radar and Camera Fusion [6.639648061168067]
Multi-view radar-camera fused 3D object detection provides a farther detection range and more helpful features for autonomous driving. Current radar-camera fusion methods deliver kinds of designs to fuse radar information with camera data. We present MVFusion, a novel Multi-View radar-camera Fusion method to achieve semantic-aligned radar features.
arXiv Detail & Related papers (2023-02-21T08:25:50Z)
CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for Robust 3D Object Detection [12.557361522985898]
We propose a camera-radar matching network CramNet to fuse the sensor readings from camera and radar in a joint 3D space. Our method supports training with sensor modality dropout, which leads to robust 3D object detection, even when a camera or radar sensor suddenly malfunctions on a vehicle.
arXiv Detail & Related papers (2022-10-17T17:18:47Z)
Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images [96.66271207089096]
FCOS-LiDAR is a fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes. We show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors.
arXiv Detail & Related papers (2022-05-27T05:42:16Z)
PC-DAN: Point Cloud based Deep Affinity Network for 3D Multi-Object Tracking (Accepted as an extended abstract in JRDB-ACT Workshop at CVPR21) [68.12101204123422]
A point cloud is a dense compilation of spatial data in 3D coordinates. We propose a PointNet-based approach for 3D Multi-Object Tracking (MOT)
arXiv Detail & Related papers (2021-06-03T05:36:39Z)
RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects [73.80316195652493]
We tackle the problem of exploiting Radar for perception in the context of self-driving cars. We propose a new solution that exploits both LiDAR and Radar sensors for perception. Our approach, dubbed RadarNet, features a voxel-based early fusion and an attention-based late fusion.
arXiv Detail & Related papers (2020-07-28T17:15:02Z)
siaNMS: Non-Maximum Suppression with Siamese Networks for Multi-Camera 3D Object Detection [65.03384167873564]
A siamese network is integrated into the pipeline of a well-known 3D object detector approach. associations are exploited to enhance the 3D box regression of the object. The experimental evaluation on the nuScenes dataset shows that the proposed method outperforms traditional NMS approaches.
arXiv Detail & Related papers (2020-02-19T15:32:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.