Bridging the View Disparity of Radar and Camera Features for Multi-modal
Fusion 3D Object Detection
- URL: http://arxiv.org/abs/2208.12079v1
- Date: Thu, 25 Aug 2022 13:21:37 GMT
- Title: Bridging the View Disparity of Radar and Camera Features for Multi-modal
Fusion 3D Object Detection
- Authors: Taohua Zhou, Yining Shi, Junjie Chen, Kun Jiang, Mengmeng Yang, Diange
Yang
- Abstract summary: This paper focuses on how to utilize millimeter-wave (MMW) radar and camera sensor fusion for 3D object detection.
A novel method which realizes the feature-level fusion under bird-eye view (BEV) for a better feature representation is proposed.
- Score: 6.959556180268547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Environmental perception with multi-modal fusion of radar and camera is
crucial in autonomous driving to increase the accuracy, completeness, and
robustness. This paper focuses on how to utilize millimeter-wave (MMW) radar
and camera sensor fusion for 3D object detection. A novel method which realizes
the feature-level fusion under bird-eye view (BEV) for a better feature
representation is proposed. Firstly, radar features are augmented with temporal
accumulation and sent to a temporal-spatial encoder for radar feature
extraction. Meanwhile, multi-scale image 2D features which adapt to various
spatial scales are obtained by image backbone and neck model. Then, image
features are transformed to BEV with the designed view transformer. In
addition, this work fuses the multi-modal features with a two-stage fusion
model called point fusion and ROI fusion, respectively. Finally, a detection
head regresses objects category and 3D locations. Experimental results
demonstrate that the proposed method realizes the state-of-the-art performance
under the most important detection metrics, mean average precision (mAP) and
nuScenes detection score (NDS) on the challenging nuScenes dataset.
Related papers
- CRT-Fusion: Camera, Radar, Temporal Fusion Using Motion Information for 3D Object Detection [9.509625131289429]
We introduce CRT-Fusion, a novel framework that integrates temporal information into radar-camera fusion.
CRT-Fusion achieves state-of-the-art performance for radar-camera-based 3D object detection.
arXiv Detail & Related papers (2024-11-05T11:25:19Z) - Progressive Multi-Modal Fusion for Robust 3D Object Detection [12.048303829428452]
Existing methods perform sensor fusion in a single view by projecting features from both modalities either in Bird's Eye View (BEV) or Perspective View (PV)
We propose ProFusion3D, a progressive fusion framework that combines features in both BEV and PV at both intermediate and object query levels.
Our architecture hierarchically fuses local and global features, enhancing the robustness of 3D object detection.
arXiv Detail & Related papers (2024-10-09T22:57:47Z) - RCBEVDet++: Toward High-accuracy Radar-Camera Fusion 3D Perception Network [34.45694077040797]
We present a radar-camera fusion 3D object detection framework called BEEVDet.
RadarBEVNet encodes sparse radar points into a dense bird's-eye-view feature.
Our method achieves state-of-the-art radar-camera fusion results in 3D object detection, BEV semantic segmentation, and 3D multi-object tracking tasks.
arXiv Detail & Related papers (2024-09-08T05:14:27Z) - VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - MVFusion: Multi-View 3D Object Detection with Semantic-aligned Radar and
Camera Fusion [6.639648061168067]
Multi-view radar-camera fused 3D object detection provides a farther detection range and more helpful features for autonomous driving.
Current radar-camera fusion methods deliver kinds of designs to fuse radar information with camera data.
We present MVFusion, a novel Multi-View radar-camera Fusion method to achieve semantic-aligned radar features.
arXiv Detail & Related papers (2023-02-21T08:25:50Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z) - DeepFusion: A Robust and Modular 3D Object Detector for Lidars, Cameras
and Radars [2.2166853714891057]
We propose a modular multi-modal architecture to fuse lidars, cameras and radars in different combinations for 3D object detection.
Specialized feature extractors take advantage of each modality and can be exchanged easily, making the approach simple and flexible.
Experimental results for lidar-camera, lidar-camera-radar and camera-radar fusion show the flexibility and effectiveness of our fusion approach.
arXiv Detail & Related papers (2022-09-26T14:33:30Z) - MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth
Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems.
Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion.
We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - EPNet++: Cascade Bi-directional Fusion for Multi-Modal 3D Object
Detection [56.03081616213012]
We propose EPNet++ for multi-modal 3D object detection by introducing a novel Cascade Bi-directional Fusion(CB-Fusion) module.
The proposed CB-Fusion module boosts the plentiful semantic information of point features with the image features in a cascade bi-directional interaction fusion manner.
The experiment results on the KITTI, JRDB and SUN-RGBD datasets demonstrate the superiority of EPNet++ over the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-21T10:48:34Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.