SFGFusion: Surface Fitting Guided 3D Object Detection with 4D Radar and Camera Fusion
- URL: http://arxiv.org/abs/2510.19215v1
- Date: Wed, 22 Oct 2025 03:56:27 GMT
- Title: SFGFusion: Surface Fitting Guided 3D Object Detection with 4D Radar and Camera Fusion
- Authors: Xiaozhi Li, Huijun Di, Jian Li, Feng Liu, Wei Liang,
- Abstract summary: We introduce SFGFusion, a novel camera-4D imaging radar detection network guided by surface fitting.<n>The explicit surface fitting model enhances spatial representation and cross-modal interaction, enabling more reliable prediction of fine-grained dense depth.<n> Experimental results show that SFGFusion effectively fuses camera and 4D radar features, achieving superior performance on the TJ4DRadSet and view-of-delft (VoD) object detection benchmarks.
- Score: 12.877894178462297
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detection is essential for autonomous driving. As an emerging sensor, 4D imaging radar offers advantages as low cost, long-range detection, and accurate velocity measurement, making it highly suitable for object detection. However, its sparse point clouds and low resolution limit object geometric representation and hinder multi-modal fusion. In this study, we introduce SFGFusion, a novel camera-4D imaging radar detection network guided by surface fitting. By estimating quadratic surface parameters of objects from image and radar data, the explicit surface fitting model enhances spatial representation and cross-modal interaction, enabling more reliable prediction of fine-grained dense depth. The predicted depth serves two purposes: 1) in an image branch to guide the transformation of image features from perspective view (PV) to a unified bird's-eye view (BEV) for multi-modal fusion, improving spatial mapping accuracy; and 2) in a surface pseudo-point branch to generate dense pseudo-point cloud, mitigating the radar point sparsity. The original radar point cloud is also encoded in a separate radar branch. These two point cloud branches adopt a pillar-based method and subsequently transform the features into the BEV space. Finally, a standard 2D backbone and detection head are used to predict object labels and bounding boxes from BEV features. Experimental results show that SFGFusion effectively fuses camera and 4D radar features, achieving superior performance on the TJ4DRadSet and view-of-delft (VoD) object detection benchmarks.
Related papers
- RadarGen: Automotive Radar Point Cloud Generation from Cameras [64.69976771710057]
We present RadarGen, a diffusion model for synthesizing realistic automotive radar point clouds from multi-view camera imagery.<n>RadarGen adapts efficient image-latent diffusion to the radar domain by representing radar measurements in bird's-eye-view form.<n>We show that RadarGen captures characteristic radar measurement distributions and reduces the gap to perception models trained on real data.
arXiv Detail & Related papers (2025-12-19T18:57:33Z) - M^3Detection: Multi-Frame Multi-Level Feature Fusion for Multi-Modal 3D Object Detection with Camera and 4D Imaging Radar [12.877894178462297]
M3Detection is a unified multi-frame 3D object detection framework that performs multi-level feature fusion on multi-modal data from camera and 4D radar.<n>We show that M3Detection achieves state-of-the-art 3D detection performance, its effectiveness in multi-frame detection with camera-4D imaging radar fusion.
arXiv Detail & Related papers (2025-10-31T04:34:15Z) - MLF-4DRCNet: Multi-Level Fusion with 4D Radar and Camera for 3D Object Detection in Autonomous Driving [31.26862558777292]
MLF-4DRCNet is a novel framework for 3D object detection via multi-level fusion of 4D radar and camera images.<n>It incorporates the point-, scene-, and proposal-level multi-modal information, enabling comprehensive feature representation.<n>It attains performance comparable to LiDAR-based models on the View-of-Delft dataset.
arXiv Detail & Related papers (2025-09-23T04:02:28Z) - RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection [68.99784784185019]
Poor lighting or adverse weather conditions degrade camera performance.<n>Radar suffers from noise and positional ambiguity.<n>We propose RobuRCDet, a robust object detection model in BEV.
arXiv Detail & Related papers (2025-02-18T17:17:38Z) - RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion [58.77329237533034]
We propose a Radar-Camera fusion transformer (RaCFormer) to boost the accuracy of 3D object detection.<n>RaCFormer achieves superior results of 64.9% mAP and 70.2% on nuScenes datasets.
arXiv Detail & Related papers (2024-12-17T09:47:48Z) - V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection [64.93675471780209]
We present V2X-R, the first simulated V2X dataset incorporating LiDAR, camera, and 4D radar.<n>V2X-R contains 12,079 scenarios with 37,727 frames of LiDAR and 4D radar point clouds, 150,908 images, and 170,859 annotated 3D vehicle bounding boxes.<n>We propose a novel cooperative LiDAR-4D radar fusion pipeline for 3D object detection and implement it with various fusion strategies.
arXiv Detail & Related papers (2024-11-13T07:41:47Z) - VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - 4DRVO-Net: Deep 4D Radar-Visual Odometry Using Multi-Modal and
Multi-Scale Adaptive Fusion [2.911052912709637]
Four-dimensional (4D) radar--visual odometry (4DRVO) integrates complementary information from 4D radar and cameras.
4DRVO may exhibit significant tracking errors owing to sparsity of 4D radar point clouds.
We present 4DRVO-Net, which is a method for 4D radar--visual odometry.
arXiv Detail & Related papers (2023-08-12T14:00:09Z) - SMURF: Spatial Multi-Representation Fusion for 3D Object Detection with
4D Imaging Radar [12.842457981088378]
This paper introduces spatial multi-representation fusion (SMURF), a novel approach to 3D object detection using a single 4D imaging radar.
SMURF mitigates measurement inaccuracy caused by limited angular resolution and multi-path propagation of radar signals.
Experimental evaluations on View-of-Delft (VoD) and TJ4DRadSet datasets demonstrate the effectiveness and generalization ability of SMURF.
arXiv Detail & Related papers (2023-07-20T11:33:46Z) - Bridging the View Disparity of Radar and Camera Features for Multi-modal
Fusion 3D Object Detection [6.959556180268547]
This paper focuses on how to utilize millimeter-wave (MMW) radar and camera sensor fusion for 3D object detection.
A novel method which realizes the feature-level fusion under bird-eye view (BEV) for a better feature representation is proposed.
arXiv Detail & Related papers (2022-08-25T13:21:37Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.