Bridging the View Disparity of Radar and Camera Features for Multi-modal
Fusion 3D Object Detection
- URL: http://arxiv.org/abs/2208.12079v1
- Date: Thu, 25 Aug 2022 13:21:37 GMT
- Title: Bridging the View Disparity of Radar and Camera Features for Multi-modal
Fusion 3D Object Detection
- Authors: Taohua Zhou, Yining Shi, Junjie Chen, Kun Jiang, Mengmeng Yang, Diange
Yang
- Abstract summary: This paper focuses on how to utilize millimeter-wave (MMW) radar and camera sensor fusion for 3D object detection.
A novel method which realizes the feature-level fusion under bird-eye view (BEV) for a better feature representation is proposed.
- Score: 6.959556180268547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Environmental perception with multi-modal fusion of radar and camera is
crucial in autonomous driving to increase the accuracy, completeness, and
robustness. This paper focuses on how to utilize millimeter-wave (MMW) radar
and camera sensor fusion for 3D object detection. A novel method which realizes
the feature-level fusion under bird-eye view (BEV) for a better feature
representation is proposed. Firstly, radar features are augmented with temporal
accumulation and sent to a temporal-spatial encoder for radar feature
extraction. Meanwhile, multi-scale image 2D features which adapt to various
spatial scales are obtained by image backbone and neck model. Then, image
features are transformed to BEV with the designed view transformer. In
addition, this work fuses the multi-modal features with a two-stage fusion
model called point fusion and ROI fusion, respectively. Finally, a detection
head regresses objects category and 3D locations. Experimental results
demonstrate that the proposed method realizes the state-of-the-art performance
under the most important detection metrics, mean average precision (mAP) and
nuScenes detection score (NDS) on the challenging nuScenes dataset.
Related papers
- Cross-Domain Spatial Matching for Camera and Radar Sensor Data Fusion in Autonomous Vehicle Perception System [0.0]
We propose a novel approach to address the problem of camera and radar sensor fusion for 3D object detection in autonomous vehicle perception systems.
Our approach builds on recent advances in deep learning and leverages the strengths of both sensors to improve object detection performance.
Our results show that the proposed approach achieves superior performance over single-sensor solutions and could directly compete with other top-level fusion methods.
arXiv Detail & Related papers (2024-04-25T12:04:31Z) - FusionViT: Hierarchical 3D Object Detection via LiDAR-Camera Vision
Transformer Fusion [8.168523242105763]
We will introduce a novel vision transformer-based 3D object detection model, namely FusionViT.
Our FusionViT model can achieve state-of-the-art performance and outperforms existing baseline methods.
arXiv Detail & Related papers (2023-11-07T00:12:01Z) - ShaSTA-Fuse: Camera-LiDAR Sensor Fusion to Model Shape and
Spatio-Temporal Affinities for 3D Multi-Object Tracking [26.976216624424385]
3D multi-object tracking (MOT) is essential for an autonomous mobile agent to safely navigate a scene.
We aim to develop a 3D MOT framework that fuses camera and LiDAR sensor information.
arXiv Detail & Related papers (2023-10-04T02:17:59Z) - MVFusion: Multi-View 3D Object Detection with Semantic-aligned Radar and
Camera Fusion [6.639648061168067]
Multi-view radar-camera fused 3D object detection provides a farther detection range and more helpful features for autonomous driving.
Current radar-camera fusion methods deliver kinds of designs to fuse radar information with camera data.
We present MVFusion, a novel Multi-View radar-camera Fusion method to achieve semantic-aligned radar features.
arXiv Detail & Related papers (2023-02-21T08:25:50Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z) - DeepFusion: A Robust and Modular 3D Object Detector for Lidars, Cameras
and Radars [2.2166853714891057]
We propose a modular multi-modal architecture to fuse lidars, cameras and radars in different combinations for 3D object detection.
Specialized feature extractors take advantage of each modality and can be exchanged easily, making the approach simple and flexible.
Experimental results for lidar-camera, lidar-camera-radar and camera-radar fusion show the flexibility and effectiveness of our fusion approach.
arXiv Detail & Related papers (2022-09-26T14:33:30Z) - MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth
Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems.
Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion.
We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object
Detection [56.03081616213012]
We propose EPNet++ for multi-modal 3D object detection by introducing a novel Cascade Bi-directional Fusion(CB-Fusion) module.
The proposed CB-Fusion module boosts the plentiful semantic information of point features with the image features in a cascade bi-directional interaction fusion manner.
The experiment results on the KITTI, JRDB and SUN-RGBD datasets demonstrate the superiority of EPNet++ over the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-21T10:48:34Z) - Perception-aware Multi-sensor Fusion for 3D LiDAR Semantic Segmentation [59.42262859654698]
3D semantic segmentation is important in scene understanding for many applications, such as auto-driving and robotics.
Existing fusion-based methods may not achieve promising performance due to vast difference between two modalities.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) to exploit perceptual information from two modalities.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.