MMDR: A Result Feature Fusion Object Detection Approach for Autonomous
System
- URL: http://arxiv.org/abs/2304.09609v1
- Date: Wed, 19 Apr 2023 12:28:42 GMT
- Title: MMDR: A Result Feature Fusion Object Detection Approach for Autonomous
System
- Authors: Wendong Zhang
- Abstract summary: The proposed approach, called Multi-Modal Detector based on Result features (MMDR), is designed to work for both 2D and 3D object detection tasks.
The MMDR model incorporates shallow global features during the feature fusion stage, endowing the model with the ability to perceive background information.
- Score: 5.499393552545591
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object detection has been extensively utilized in autonomous systems in
recent years, encompassing both 2D and 3D object detection. Recent research in
this field has primarily centered around multimodal approaches for addressing
this issue.In this paper, a multimodal fusion approach based on result
feature-level fusion is proposed. This method utilizes the outcome features
generated from single modality sources, and fuses them for downstream
tasks.Based on this method, a new post-fusing network is proposed for
multimodal object detection, which leverages the single modality outcomes as
features. The proposed approach, called Multi-Modal Detector based on Result
features (MMDR), is designed to work for both 2D and 3D object detection tasks.
Compared to previous multimodal models, the proposed approach in this paper
performs feature fusion at a later stage, enabling better representation of the
deep-level features of single modality sources. Additionally, the MMDR model
incorporates shallow global features during the feature fusion stage, endowing
the model with the ability to perceive background information and the overall
input, thereby avoiding issues such as missed detections.
Related papers
- PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest [65.48057241587398]
PoIFusion is a framework to fuse information of RGB images and LiDAR point clouds at the points of interest (PoIs)
Our approach maintains the view of each modality and obtains multi-modal features by computation-friendly projection and computation.
We conducted extensive experiments on nuScenes and Argoverse2 datasets to evaluate our approach.
arXiv Detail & Related papers (2024-03-14T09:28:12Z) - Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - mmFUSION: Multimodal Fusion for 3D Objects Detection [18.401155770778757]
Multi-sensor fusion is essential for accurate 3D object detection in self-driving systems.
In this paper, we propose a new intermediate-level multi-modal fusion approach to overcome these challenges.
The code with the mmdetection3D project plugin will be publicly available soon.
arXiv Detail & Related papers (2023-11-07T15:11:27Z) - MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection [54.52102265418295]
We propose a novel and effective Multi-Level Fusion network, named as MLF-DET, for high-performance cross-modal 3D object DETection.
For the feature-level fusion, we present the Multi-scale Voxel Image fusion (MVI) module, which densely aligns multi-scale voxel features with image features.
For the decision-level fusion, we propose the lightweight Feature-cued Confidence Rectification (FCR) module, which exploits image semantics to rectify the confidence of detection candidates.
arXiv Detail & Related papers (2023-07-18T11:26:02Z) - Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme.
Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z) - MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth
Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems.
Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion.
We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z) - Depth-Cooperated Trimodal Network for Video Salient Object Detection [13.727763221832532]
We propose a depth-operated triOD network called DCTNet for video salient object detection (VS)
To this end, we first generate depth from RGB frames, and then propose an approach to treat the three modalities unequally.
We also introduce a refinement fusion module (RFM) to suppress noises in each modality and select useful information dynamically for further feature refinement.
arXiv Detail & Related papers (2022-02-12T13:04:16Z) - MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object Detection [17.295359521427073]
We propose a Multi-Branch Deep Fusion Network (MBDF-Net) for 3D object detection.
In the first stage, our multi-branch feature extraction network utilizes Adaptive Attention Fusion modules to produce cross-modal fusion features from single-modal semantic features.
In the second stage, we use a region of interest (RoI) -pooled fusion module to generate enhanced local features for refinement.
arXiv Detail & Related papers (2021-08-29T15:40:15Z) - Multi-View Adaptive Fusion Network for 3D Object Detection [14.506796247331584]
3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving.
We propose a single-stage multi-view fusion framework that takes LiDAR bird's-eye view, LiDAR range view and camera view images as inputs for 3D object detection.
We design an end-to-end learnable network named MVAF-Net to integrate these two components.
arXiv Detail & Related papers (2020-11-02T00:06:01Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.