CRAFT: Camera-Radar 3D Object Detection with Spatio-Contextual Fusion
Transformer
- URL: http://arxiv.org/abs/2209.06535v1
- Date: Wed, 14 Sep 2022 10:25:30 GMT
- Title: CRAFT: Camera-Radar 3D Object Detection with Spatio-Contextual Fusion
Transformer
- Authors: Youngseok Kim, Sanmin Kim, Jun Won Choi, Dongsuk Kum
- Abstract summary: Camera radar sensors have significant advantages in cost, reliability, and maintenance compared to LiDAR.
Existing fusion methods often fuse the outputs of single modalities at the result-level, called the late fusion strategy.
Here we propose a novel proposal-level early fusion approach that effectively exploits both spatial and contextual properties of camera and radar for 3D object detection.
Our camera-radar fusion approach achieves the state-of-the-art 41.1% mAP and 52.3% NDS on the nuScenes test set, which is 8.7 and 10.8 points higher than the camera-only baseline, as well as yielding competitive performance on the
- Score: 14.849645397321185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camera and radar sensors have significant advantages in cost, reliability,
and maintenance compared to LiDAR. Existing fusion methods often fuse the
outputs of single modalities at the result-level, called the late fusion
strategy. This can benefit from using off-the-shelf single sensor detection
algorithms, but late fusion cannot fully exploit the complementary properties
of sensors, thus having limited performance despite the huge potential of
camera-radar fusion. Here we propose a novel proposal-level early fusion
approach that effectively exploits both spatial and contextual properties of
camera and radar for 3D object detection. Our fusion framework first associates
image proposal with radar points in the polar coordinate system to efficiently
handle the discrepancy between the coordinate system and spatial properties.
Using this as a first stage, following consecutive cross-attention based
feature fusion layers adaptively exchange spatio-contextual information between
camera and radar, leading to a robust and attentive fusion. Our camera-radar
fusion approach achieves the state-of-the-art 41.1% mAP and 52.3% NDS on the
nuScenes test set, which is 8.7 and 10.8 points higher than the camera-only
baseline, as well as yielding competitive performance on the LiDAR method.
Related papers
- MSSF: A 4D Radar and Camera Fusion Framework With Multi-Stage Sampling for 3D Object Detection in Autonomous Driving [9.184945917823047]
We present a simple but effective multi-stage sampling fusion (MSSF) network based on 4D radar and camera.
MSSF achieves a 7.0% and 4.0% improvement in 3D mean average precision on the View-of-Delft (VoD) and TJ4DRadset datasets.
It even surpasses classical LiDAR-based methods on the VoD dataset.
arXiv Detail & Related papers (2024-11-22T15:45:23Z) - FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving [63.96049803915402]
The integration of data from diverse sensor modalities constitutes a prevalent methodology within the ambit of autonomous driving scenarios.
Recent advancements in efficient point cloud transformers have underscored the efficacy of integrating information in sparse formats.
In this paper, we conduct a comprehensive exploration of design choices for Transformer-based sparse cameraLiDAR fusion.
arXiv Detail & Related papers (2024-08-13T11:46:32Z) - Cross-Domain Spatial Matching for Camera and Radar Sensor Data Fusion in Autonomous Vehicle Perception System [0.0]
We propose a novel approach to address the problem of camera and radar sensor fusion for 3D object detection in autonomous vehicle perception systems.
Our approach builds on recent advances in deep learning and leverages the strengths of both sensors to improve object detection performance.
Our results show that the proposed approach achieves superior performance over single-sensor solutions and could directly compete with other top-level fusion methods.
arXiv Detail & Related papers (2024-04-25T12:04:31Z) - Echoes Beyond Points: Unleashing the Power of Raw Radar Data in
Multi-modality Fusion [74.84019379368807]
We propose a novel method named EchoFusion to skip the existing radar signal processing pipeline.
Specifically, we first generate the Bird's Eye View (BEV) queries and then take corresponding spectrum features from radar to fuse with other sensors.
arXiv Detail & Related papers (2023-07-31T09:53:50Z) - Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection.
With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z) - MVFusion: Multi-View 3D Object Detection with Semantic-aligned Radar and
Camera Fusion [6.639648061168067]
Multi-view radar-camera fused 3D object detection provides a farther detection range and more helpful features for autonomous driving.
Current radar-camera fusion methods deliver kinds of designs to fuse radar information with camera data.
We present MVFusion, a novel Multi-View radar-camera Fusion method to achieve semantic-aligned radar features.
arXiv Detail & Related papers (2023-02-21T08:25:50Z) - CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for
Robust 3D Object Detection [12.557361522985898]
We propose a camera-radar matching network CramNet to fuse the sensor readings from camera and radar in a joint 3D space.
Our method supports training with sensor modality dropout, which leads to robust 3D object detection, even when a camera or radar sensor suddenly malfunctions on a vehicle.
arXiv Detail & Related papers (2022-10-17T17:18:47Z) - MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth
Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems.
Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion.
We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z) - TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with
Transformers [49.689566246504356]
We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions.
TransFusion achieves state-of-the-art performance on large-scale datasets.
We extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking.
arXiv Detail & Related papers (2022-03-22T07:15:13Z) - LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic
Segmentation [78.74202673902303]
We propose a coarse-tofine LiDAR and camera fusion-based network (termed as LIF-Seg) for LiDAR segmentation.
The proposed method fully utilizes the contextual information of images and introduces a simple but effective early-fusion strategy.
The cooperation of these two components leads to the success of the effective camera-LiDAR fusion.
arXiv Detail & Related papers (2021-08-17T08:53:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.