mmFUSION: Multimodal Fusion for 3D Objects Detection
- URL: http://arxiv.org/abs/2311.04058v1
- Date: Tue, 7 Nov 2023 15:11:27 GMT
- Title: mmFUSION: Multimodal Fusion for 3D Objects Detection
- Authors: Javed Ahmad and Alessio Del Bue
- Abstract summary: Multi-sensor fusion is essential for accurate 3D object detection in self-driving systems.
In this paper, we propose a new intermediate-level multi-modal fusion approach to overcome these challenges.
The code with the mmdetection3D project plugin will be publicly available soon.
- Score: 18.401155770778757
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-sensor fusion is essential for accurate 3D object detection in
self-driving systems. Camera and LiDAR are the most commonly used sensors, and
usually, their fusion happens at the early or late stages of 3D detectors with
the help of regions of interest (RoIs). On the other hand, fusion at the
intermediate level is more adaptive because it does not need RoIs from
modalities but is complex as the features of both modalities are presented from
different points of view. In this paper, we propose a new intermediate-level
multi-modal fusion (mmFUSION) approach to overcome these challenges. First, the
mmFUSION uses separate encoders for each modality to compute features at a
desired lower space volume. Second, these features are fused through
cross-modality and multi-modality attention mechanisms proposed in mmFUSION.
The mmFUSION framework preserves multi-modal information and learns to
complement modalities' deficiencies through attention weights. The strong
multi-modal features from the mmFUSION framework are fed to a simple 3D
detection head for 3D predictions. We evaluate mmFUSION on the KITTI and
NuScenes dataset where it performs better than available early, intermediate,
late, and even two-stage based fusion schemes. The code with the mmdetection3D
project plugin will be publicly available soon.
Related papers
- Progressive Multi-Modal Fusion for Robust 3D Object Detection [12.048303829428452]
Existing methods perform sensor fusion in a single view by projecting features from both modalities either in Bird's Eye View (BEV) or Perspective View (PV)
We propose ProFusion3D, a progressive fusion framework that combines features in both BEV and PV at both intermediate and object query levels.
Our architecture hierarchically fuses local and global features, enhancing the robustness of 3D object detection.
arXiv Detail & Related papers (2024-10-09T22:57:47Z) - PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest [65.48057241587398]
PoIFusion is a framework to fuse information of RGB images and LiDAR point clouds at the points of interest (PoIs)
Our approach maintains the view of each modality and obtains multi-modal features by computation-friendly projection and computation.
We conducted extensive experiments on nuScenes and Argoverse2 datasets to evaluate our approach.
arXiv Detail & Related papers (2024-03-14T09:28:12Z) - Multi-Modal 3D Object Detection by Box Matching [109.43430123791684]
We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection.
With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combing their ROI features.
arXiv Detail & Related papers (2023-05-12T18:08:51Z) - MMDR: A Result Feature Fusion Object Detection Approach for Autonomous
System [5.499393552545591]
The proposed approach, called Multi-Modal Detector based on Result features (MMDR), is designed to work for both 2D and 3D object detection tasks.
The MMDR model incorporates shallow global features during the feature fusion stage, endowing the model with the ability to perceive background information.
arXiv Detail & Related papers (2023-04-19T12:28:42Z) - Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme.
Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z) - MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth
Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems.
Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion.
We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object Detection [17.295359521427073]
We propose a Multi-Branch Deep Fusion Network (MBDF-Net) for 3D object detection.
In the first stage, our multi-branch feature extraction network utilizes Adaptive Attention Fusion modules to produce cross-modal fusion features from single-modal semantic features.
In the second stage, we use a region of interest (RoI) -pooled fusion module to generate enhanced local features for refinement.
arXiv Detail & Related papers (2021-08-29T15:40:15Z) - Multi-View Adaptive Fusion Network for 3D Object Detection [14.506796247331584]
3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving.
We propose a single-stage multi-view fusion framework that takes LiDAR bird's-eye view, LiDAR range view and camera view images as inputs for 3D object detection.
We design an end-to-end learnable network named MVAF-Net to integrate these two components.
arXiv Detail & Related papers (2020-11-02T00:06:01Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.