Related papers: CRAB: Camera-Radar Fusion for Reducing Depth Ambiguity in Backward Projection based View Transformation

CRAB: Camera-Radar Fusion for Reducing Depth Ambiguity in Backward Projection based View Transformation

URL: http://arxiv.org/abs/2509.05785v1
Date: Sat, 06 Sep 2025 17:39:30 GMT
Title: CRAB: Camera-Radar Fusion for Reducing Depth Ambiguity in Backward Projection based View Transformation
Authors: In-Jae Lee, Sihwan Hwang, Youngseok Kim, Wonjune Kim, Sanmin Kim, Dongsuk Kum,
Abstract summary: We propose a camera-radar fusion-based 3D object detection and segmentation model named CRAB.<n> CRAB aggregates perspective view image context features into BEV queries.<n>It improves depth distinction among queries along the same ray by combining the dense but unreliable depth distribution from images with the sparse yet precise depth information from radar occupancy.
Score: 19.748485957698907
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, camera-radar fusion-based 3D object detection methods in bird's eye view (BEV) have gained attention due to the complementary characteristics and cost-effectiveness of these sensors. Previous approaches using forward projection struggle with sparse BEV feature generation, while those employing backward projection overlook depth ambiguity, leading to false positives. In this paper, to address the aforementioned limitations, we propose a novel camera-radar fusion-based 3D object detection and segmentation model named CRAB (Camera-Radar fusion for reducing depth Ambiguity in Backward projection-based view transformation), using a backward projection that leverages radar to mitigate depth ambiguity. During the view transformation, CRAB aggregates perspective view image context features into BEV queries. It improves depth distinction among queries along the same ray by combining the dense but unreliable depth distribution from images with the sparse yet precise depth information from radar occupancy. We further introduce spatial cross-attention with a feature map containing radar context information to enhance the comprehension of the 3D scene. When evaluated on the nuScenes open dataset, our proposed approach achieves a state-of-the-art performance among backward projection-based camera-radar fusion methods with 62.4\% NDS and 54.0\% mAP in 3D object detection.

Related papers

SFGFusion: Surface Fitting Guided 3D Object Detection with 4D Radar and Camera Fusion [12.877894178462297]
We introduce SFGFusion, a novel camera-4D imaging radar detection network guided by surface fitting.<n>The explicit surface fitting model enhances spatial representation and cross-modal interaction, enabling more reliable prediction of fine-grained dense depth.<n> Experimental results show that SFGFusion effectively fuses camera and 4D radar features, achieving superior performance on the TJ4DRadSet and view-of-delft (VoD) object detection benchmarks.
arXiv Detail & Related papers (2025-10-22T03:56:27Z)
FreqPDE: Rethinking Positional Depth Embedding for Multi-View 3D Object Detection Transformers [91.59069344768858]
We introduce Frequency-aware Positional Depth Embedding (FreqPDE) to equip 2D image features with spatial information for 3D detection transformer decoder.<n>FreqPDE combines the 2D image features and 3D position embeddings to generate 3D depth-aware features for query decoding.
arXiv Detail & Related papers (2025-10-17T07:36:54Z)
PAN: Pillars-Attention-Based Network for 3D Object Detection [3.3274570204477922]
This work presents a novel 3D object detection algorithm using cameras and radars in the bird's-eye-view (BEV)<n>Our algorithm exploits the advantages of radar before fusing the features into a detection head.<n>A new backbone is introduced, which maps the radar pillar features into an embedded dimension.
arXiv Detail & Related papers (2025-09-19T12:40:49Z)
RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection [68.99784784185019]
Poor lighting or adverse weather conditions degrade camera performance.<n>Radar suffers from noise and positional ambiguity.<n>We propose RobuRCDet, a robust object detection model in BEV.
arXiv Detail & Related papers (2025-02-18T17:17:38Z)
RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion [58.77329237533034]
We propose a Radar-Camera fusion transformer (RaCFormer) to boost the accuracy of 3D object detection.<n>RaCFormer achieves superior results of 64.9% mAP and 70.2% on nuScenes datasets.
arXiv Detail & Related papers (2024-12-17T09:47:48Z)
LXL: LiDAR Excluded Lean 3D Object Detection with 4D Imaging Radar and Camera Fusion [14.520176332262725]
This paper investigates the "sampling" view transformation strategy on the camera and 4D imaging radar fusion-based 3D object detection. We show that more accurate view transformation can be performed by introducing image depths and radar information to enhance the "sampling" strategy. Experiments on VoD and TJ4DRadSet datasets show that the proposed method outperforms the state-of-the-art 3D object detection methods by a significant margin without bells and whistles.
arXiv Detail & Related papers (2023-07-03T03:09:44Z)
EA-LSS: Edge-aware Lift-splat-shot Framework for 3D BEV Object Detection [9.289537252177048]
We propose a novel Edge-aware Lift-splat-shot (EA-LSS) framework for 3D object detection. Our EA-LSS framework is compatible for any LSS-based 3D object detection models.
arXiv Detail & Related papers (2023-03-31T08:56:29Z)
OA-DET3D: Embedding Object Awareness as a General Plug-in for Multi-Camera 3D Object Detection [77.43427778037203]
We introduce OA-DET3D, a plug-in module that improves 3D object detection.<n> OA-DET3D boosts the representation of objects by leveraging object-centric depth information and foreground pseudo points.<n>We conduct extensive experiments on the nuScenes dataset and Argoverse 2 dataset to validate the merits of OA-DET3D.
arXiv Detail & Related papers (2023-01-13T06:02:31Z)
OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object Detection [51.153003057515754]
OPA-3D is a single-stage, end-to-end, Occlusion-Aware Pixel-Wise Aggregation network. It jointly estimates dense scene depth with depth-bounding box residuals and object bounding boxes. It outperforms state-of-the-art methods on the main Car category.
arXiv Detail & Related papers (2022-11-02T14:19:13Z)
BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection [13.319949358652192]
We propose a new 3D object detector with a trustworthy depth estimation, dubbed BEVDepth, for camera-based Bird's-Eye-View 3D object detection. BEVDepth achieves the new state-of-the-art 60.0% NDS on the challenging nuScenes test set.
arXiv Detail & Related papers (2022-06-21T03:21:18Z)
BirdNet+: End-to-End 3D Object Detection in LiDAR Bird's Eye View [117.44028458220427]
On-board 3D object detection in autonomous vehicles often relies on geometry information captured by LiDAR devices. We present a fully end-to-end 3D object detection framework that can infer oriented 3D boxes solely from BEV images.
arXiv Detail & Related papers (2020-03-09T15:08:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.