Multi-Modality Cascaded Fusion Technology for Autonomous Driving
- URL: http://arxiv.org/abs/2002.03138v1
- Date: Sat, 8 Feb 2020 10:59:18 GMT
- Title: Multi-Modality Cascaded Fusion Technology for Autonomous Driving
- Authors: Hongwu Kuang, Xiaodong Liu, Jingwei Zhang, Zicheng Fang
- Abstract summary: We propose a general multi-modality cascaded fusion framework, exploiting the advantages of decision-level and feature-level fusion.
In the fusion process, dynamic coordinate alignment(DCA) is conducted to reduce the error between sensors from different modalities.
The proposed step-by-step cascaded fusion framework is more interpretable and flexible compared to the end-toend fusion methods.
- Score: 18.93984652806857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modality fusion is the guarantee of the stability of autonomous driving
systems. In this paper, we propose a general multi-modality cascaded fusion
framework, exploiting the advantages of decision-level and feature-level
fusion, utilizing target position, size, velocity, appearance and confidence to
achieve accurate fusion results. In the fusion process, dynamic coordinate
alignment(DCA) is conducted to reduce the error between sensors from different
modalities. In addition, the calculation of affinity matrix is the core module
of sensor fusion, we propose an affinity loss that improves the performance of
deep affinity network(DAN). Last, the proposed step-by-step cascaded fusion
framework is more interpretable and flexible compared to the end-toend fusion
methods. Extensive experiments on Nuscenes [2] dataset show that our approach
achieves the state-of-theart performance.dataset show that our approach
achieves the state-of-the-art performance.
Related papers
- MMLF: Multi-modal Multi-class Late Fusion for Object Detection with Uncertainty Estimation [13.624431305114564]
This paper introduces a pioneering Multi-modal Multi-class Late Fusion method, designed for late fusion to enable multi-class detection.
Experiments conducted on the KITTI validation and official test datasets illustrate substantial performance improvements.
Our approach incorporates uncertainty analysis into the classification fusion process, rendering our model more transparent and trustworthy.
arXiv Detail & Related papers (2024-10-11T11:58:35Z) - Progressively Modality Freezing for Multi-Modal Entity Alignment [27.77877721548588]
We propose a novel strategy of progressive modality freezing, called PMF, that focuses on alignmentrelevant features.
Notably, our approach introduces a pioneering cross-modal association loss to foster modal consistency.
Empirical evaluations across nine datasets confirm PMF's superiority.
arXiv Detail & Related papers (2024-07-23T04:22:30Z) - How Intermodal Interaction Affects the Performance of Deep Multimodal Fusion for Mixed-Type Time Series [3.6958071416494414]
Mixed-type time series (MTTS) is a bimodal data type common in many domains, such as healthcare, finance, environmental monitoring, and social media.
The integration of both modalities through multimodal fusion is a promising approach for processing MTTS.
We present a comprehensive evaluation of several deep multimodal fusion approaches for MTTS forecasting.
arXiv Detail & Related papers (2024-06-21T12:26:48Z) - E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection [21.185032466325737]
We introduce E2E-MFD, a novel end-to-end algorithm for multimodal fusion detection.
E2E-MFD streamlines the process, achieving high performance with a single training phase.
Our extensive testing on multiple public datasets reveals E2E-MFD's superior capabilities.
arXiv Detail & Related papers (2024-03-14T12:12:17Z) - MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection [54.52102265418295]
We propose a novel and effective Multi-Level Fusion network, named as MLF-DET, for high-performance cross-modal 3D object DETection.
For the feature-level fusion, we present the Multi-scale Voxel Image fusion (MVI) module, which densely aligns multi-scale voxel features with image features.
For the decision-level fusion, we propose the lightweight Feature-cued Confidence Rectification (FCR) module, which exploits image semantics to rectify the confidence of detection candidates.
arXiv Detail & Related papers (2023-07-18T11:26:02Z) - Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.
We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process.
Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z) - Provable Dynamic Fusion for Low-Quality Multimodal Data [94.39538027450948]
Dynamic multimodal fusion emerges as a promising learning paradigm.
Despite its widespread use, theoretical justifications in this field are still notably lacking.
This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective.
A novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of classification accuracy and model robustness.
arXiv Detail & Related papers (2023-06-03T08:32:35Z) - Equivariant Multi-Modality Image Fusion [124.11300001864579]
We propose the Equivariant Multi-Modality imAge fusion paradigm for end-to-end self-supervised learning.
Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations.
Experiments confirm that EMMA yields high-quality fusion results for infrared-visible and medical images.
arXiv Detail & Related papers (2023-05-19T05:50:24Z) - MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis [84.7287684402508]
Current deep learning approaches for multimodal fusion rely on bottom-up fusion of high and mid-level latent modality representations.
Models of human perception highlight the importance of top-down fusion, where high-level representations affect the way sensory inputs are perceived.
We propose a neural architecture that captures top-down cross-modal interactions, using a feedback mechanism in the forward pass during network training.
arXiv Detail & Related papers (2022-01-24T17:48:04Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - Memory based fusion for multi-modal deep learning [39.29589204750581]
Memory based Attentive Fusion layer fuses modes by incorporating both the current features and longterm dependencies in the data.
We present a novel Memory based Attentive Fusion layer, which fuses modes by incorporating both the current features and longterm dependencies in the data.
arXiv Detail & Related papers (2020-07-16T02:05:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.