3D Dual-Fusion: Dual-Domain Dual-Query Camera-LiDAR Fusion for 3D Object
Detection
- URL: http://arxiv.org/abs/2211.13529v1
- Date: Thu, 24 Nov 2022 11:00:50 GMT
- Title: 3D Dual-Fusion: Dual-Domain Dual-Query Camera-LiDAR Fusion for 3D Object
Detection
- Authors: Yecheol Kim, Konyul Park, Minwook Kim, Dongsuk Kum, Jun Won Choi
- Abstract summary: We propose a novel camera-LiDAR fusion architecture called 3D Dual-Fusion.
The proposed method fuses the features of the camera-view and 3D voxel-view domain and models their interactions through deformable attention.
The results of an experimental evaluation show that the proposed camera-LiDAR fusion architecture achieved competitive performance on the KITTI and nuScenes datasets.
- Score: 13.068266058374775
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fusing data from cameras and LiDAR sensors is an essential technique to
achieve robust 3D object detection. One key challenge in camera-LiDAR fusion
involves mitigating the large domain gap between the two sensors in terms of
coordinates and data distribution when fusing their features. In this paper, we
propose a novel camera-LiDAR fusion architecture called, 3D Dual-Fusion, which
is designed to mitigate the gap between the feature representations of camera
and LiDAR data. The proposed method fuses the features of the camera-view and
3D voxel-view domain and models their interactions through deformable
attention. We redesign the transformer fusion encoder to aggregate the
information from the two domains. Two major changes include 1) dual query-based
deformable attention to fuse the dual-domain features interactively and 2) 3D
local self-attention to encode the voxel-domain queries prior to dual-query
decoding. The results of an experimental evaluation show that the proposed
camera-LiDAR fusion architecture achieved competitive performance on the KITTI
and nuScenes datasets, with state-of-the-art performances in some 3D object
detection benchmarks categories.
Related papers
- Progressive Multi-Modal Fusion for Robust 3D Object Detection [12.048303829428452]
Existing methods perform sensor fusion in a single view by projecting features from both modalities either in Bird's Eye View (BEV) or Perspective View (PV)
We propose ProFusion3D, a progressive fusion framework that combines features in both BEV and PV at both intermediate and object query levels.
Our architecture hierarchically fuses local and global features, enhancing the robustness of 3D object detection.
arXiv Detail & Related papers (2024-10-09T22:57:47Z) - Cross-Domain Spatial Matching for Camera and Radar Sensor Data Fusion in Autonomous Vehicle Perception System [0.0]
We propose a novel approach to address the problem of camera and radar sensor fusion for 3D object detection in autonomous vehicle perception systems.
Our approach builds on recent advances in deep learning and leverages the strengths of both sensors to improve object detection performance.
Our results show that the proposed approach achieves superior performance over single-sensor solutions and could directly compete with other top-level fusion methods.
arXiv Detail & Related papers (2024-04-25T12:04:31Z) - FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection [11.962073589763676]
Existing 3D detectors significantly improve the accuracy by adopting a two-stage paradigm.
The sparsity of point clouds, especially for the points far away, makes it difficult for the LiDAR-only refinement module to accurately recognize and locate objects.
We propose a novel multi-modality two-stage approach named FusionRCNN, which effectively and efficiently fuses point clouds and camera images in the Regions of Interest(RoI)
FusionRCNN significantly improves the strong SECOND baseline by 6.14% mAP on baseline, and outperforms competing two-stage approaches.
arXiv Detail & Related papers (2022-09-22T02:07:25Z) - MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth
Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems.
Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion.
We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with
Transformers [49.689566246504356]
We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions.
TransFusion achieves state-of-the-art performance on large-scale datasets.
We extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking.
arXiv Detail & Related papers (2022-03-22T07:15:13Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z) - 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View
Spatial Feature Fusion for 3D Object Detection [10.507404260449333]
We propose a new architecture for fusing camera and LiDAR sensors for 3D object detection.
The proposed 3D-CVF achieves state-of-the-art performance in the KITTI benchmark.
arXiv Detail & Related papers (2020-04-27T08:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.