Related papers: CVFusion: Cross-View Fusion of 4D Radar and Camera for 3D Object Detection

CVFusion: Cross-View Fusion of 4D Radar and Camera for 3D Object Detection

URL: http://arxiv.org/abs/2507.04587v1
Date: Mon, 07 Jul 2025 00:45:53 GMT
Title: CVFusion: Cross-View Fusion of 4D Radar and Camera for 3D Object Detection
Authors: Hanzhi Zhong, Zhiyu Xiang, Ruoyu Xu, Jingyun Fu, Peng Xu, Shaohong Wang, Zhihao Yang, Tianyu Pu, Eryun Liu,
Abstract summary: We propose a cross-view two-stage fusion network called CVFusion.<n>In the first stage, we design a radar guided iterative (RGIter) BEV fusion module to generate high-recall 3D proposal boxes.<n>In the second stage, we aggregate features from multiple heterogeneous views including points, image, and BEV for each proposal.<n>Our method outperforms the previous state-of-the-art methods by a large margin, with 9.10% and 3.68% mAP improvements on View-of-Delft (VoD) and TJ4DRadSet, respectively.
Score: 11.109888378081187
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 4D radar has received significant attention in autonomous driving thanks to its robustness under adverse weathers. Due to the sparse points and noisy measurements of the 4D radar, most of the research finish the 3D object detection task by integrating images from camera and perform modality fusion in BEV space. However, the potential of the radar and the fusion mechanism is still largely unexplored, hindering the performance improvement. In this study, we propose a cross-view two-stage fusion network called CVFusion. In the first stage, we design a radar guided iterative (RGIter) BEV fusion module to generate high-recall 3D proposal boxes. In the second stage, we aggregate features from multiple heterogeneous views including points, image, and BEV for each proposal. These comprehensive instance level features greatly help refine the proposals and generate high-quality predictions. Extensive experiments on public datasets show that our method outperforms the previous state-of-the-art methods by a large margin, with 9.10% and 3.68% mAP improvements on View-of-Delft (VoD) and TJ4DRadSet, respectively. Our code will be made publicly available.

Related papers

SFGFusion: Surface Fitting Guided 3D Object Detection with 4D Radar and Camera Fusion [12.877894178462297]
We introduce SFGFusion, a novel camera-4D imaging radar detection network guided by surface fitting.<n>The explicit surface fitting model enhances spatial representation and cross-modal interaction, enabling more reliable prediction of fine-grained dense depth.<n> Experimental results show that SFGFusion effectively fuses camera and 4D radar features, achieving superior performance on the TJ4DRadSet and view-of-delft (VoD) object detection benchmarks.
arXiv Detail & Related papers (2025-10-22T03:56:27Z)
MLF-4DRCNet: Multi-Level Fusion with 4D Radar and Camera for 3D Object Detection in Autonomous Driving [31.26862558777292]
MLF-4DRCNet is a novel framework for 3D object detection via multi-level fusion of 4D radar and camera images.<n>It incorporates the point-, scene-, and proposal-level multi-modal information, enabling comprehensive feature representation.<n>It attains performance comparable to LiDAR-based models on the View-of-Delft dataset.
arXiv Detail & Related papers (2025-09-23T04:02:28Z)
ZFusion: An Effective Fuser of Camera and 4D Radar for 3D Object Perception in Autonomous Driving [7.037019489455008]
We propose a 3D object detection method, termed ZFusion, which fuses 4D radar and vision modality.<n> FP-DDCA fuser packs Transformer blocks to interactively fuse multi-modal features at different scales.<n>Experiments show that ZFusion achieved the state-of-the-art mAP in the region of interest.
arXiv Detail & Related papers (2025-04-04T13:29:32Z)
RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection [68.99784784185019]
Poor lighting or adverse weather conditions degrade camera performance.<n>Radar suffers from noise and positional ambiguity.<n>We propose RobuRCDet, a robust object detection model in BEV.
arXiv Detail & Related papers (2025-02-18T17:17:38Z)
Doracamom: Joint 3D Detection and Occupancy Prediction with Multi-view 4D Radars and Cameras for Omnidirectional Perception [9.76463525667238]
We propose Doracamom, the first framework that fuses multi-view cameras and 4D radar for joint 3D object detection and semantic occupancy prediction.<n>Code and models will be publicly available.
arXiv Detail & Related papers (2025-01-26T04:24:07Z)
RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion [58.77329237533034]
We propose a Radar-Camera fusion transformer (RaCFormer) to boost the accuracy of 3D object detection.<n>RaCFormer achieves superior results of 64.9% mAP and 70.2% on nuScenes datasets.
arXiv Detail & Related papers (2024-12-17T09:47:48Z)
MSSF: A 4D Radar and Camera Fusion Framework With Multi-Stage Sampling for 3D Object Detection in Autonomous Driving [9.184945917823047]
We present a simple but effective multi-stage sampling fusion (MSSF) network based on 4D radar and camera. MSSF achieves a 7.0% and 4.0% improvement in 3D mean average precision on the View-of-Delft (VoD) and TJ4DRadset datasets. It even surpasses classical LiDAR-based methods on the VoD dataset.
arXiv Detail & Related papers (2024-11-22T15:45:23Z)
V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection [64.93675471780209]
We present V2X-R, the first simulated V2X dataset incorporating LiDAR, camera, and 4D radar.<n>V2X-R contains 12,079 scenarios with 37,727 frames of LiDAR and 4D radar point clouds, 150,908 images, and 170,859 annotated 3D vehicle bounding boxes.<n>We propose a novel cooperative LiDAR-4D radar fusion pipeline for 3D object detection and implement it with various fusion strategies.
arXiv Detail & Related papers (2024-11-13T07:41:47Z)
UniBEVFusion: Unified Radar-Vision BEVFusion for 3D Object Detection [2.123197540438989]
Many radar-vision fusion models treat radar as a sparse LiDAR, underutilizing radar-specific information. We propose the Radar Depth Lift-Splat-Shoot (RDL) module, which integrates radar-specific data into the depth prediction process. We also introduce a Unified Feature Fusion (UFF) approach that extracts BEV features across different modalities.
arXiv Detail & Related papers (2024-09-23T06:57:27Z)
DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving. We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z)
MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems. Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion. We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z)
Bridging the View Disparity of Radar and Camera Features for Multi-modal Fusion 3D Object Detection [6.959556180268547]
This paper focuses on how to utilize millimeter-wave (MMW) radar and camera sensor fusion for 3D object detection. A novel method which realizes the feature-level fusion under bird-eye view (BEV) for a better feature representation is proposed.
arXiv Detail & Related papers (2022-08-25T13:21:37Z)
BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation [105.96557764248846]
We introduce BEVFusion, a generic multi-task multi-sensor fusion framework. It unifies multi-modal features in the shared bird's-eye view representation space. It achieves 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower cost.
arXiv Detail & Related papers (2022-05-26T17:59:35Z)
Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection. The whole architecture facilitates two-stage fusion. Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.