A novel multimodal fusion network based on a joint coding model for lane
line segmentation
- URL: http://arxiv.org/abs/2103.11114v1
- Date: Sat, 20 Mar 2021 06:47:58 GMT
- Title: A novel multimodal fusion network based on a joint coding model for lane
line segmentation
- Authors: Zhenhong Zou, Xinyu Zhang, Huaping Liu, Zhiwei Li, Amir Hussain and
Jun Li
- Abstract summary: We introduce a novel multimodal fusion architecture from an information theory perspective.
We demonstrate its practical utility using LiDAR camera fusion networks.
Our optimal fusion network achieves 85%+ lane line accuracy and 98.7%+ overall.
- Score: 22.89466867866239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has recently been growing interest in utilizing multimodal sensors to
achieve robust lane line segmentation. In this paper, we introduce a novel
multimodal fusion architecture from an information theory perspective, and
demonstrate its practical utility using Light Detection and Ranging (LiDAR)
camera fusion networks. In particular, we develop, for the first time, a
multimodal fusion network as a joint coding model, where each single node,
layer, and pipeline is represented as a channel. The forward propagation is
thus equal to the information transmission in the channels. Then, we can
qualitatively and quantitatively analyze the effect of different fusion
approaches. We argue the optimal fusion architecture is related to the
essential capacity and its allocation based on the source and channel. To test
this multimodal fusion hypothesis, we progressively determine a series of
multimodal models based on the proposed fusion methods and evaluate them on the
KITTI and the A2D2 datasets. Our optimal fusion network achieves 85%+ lane line
accuracy and 98.7%+ overall. The performance gap among the models will inform
continuing future research into development of optimal fusion algorithms for
the deep multimodal learning community.
Related papers
- MMLF: Multi-modal Multi-class Late Fusion for Object Detection with Uncertainty Estimation [13.624431305114564]
This paper introduces a pioneering Multi-modal Multi-class Late Fusion method, designed for late fusion to enable multi-class detection.
Experiments conducted on the KITTI validation and official test datasets illustrate substantial performance improvements.
Our approach incorporates uncertainty analysis into the classification fusion process, rendering our model more transparent and trustworthy.
arXiv Detail & Related papers (2024-10-11T11:58:35Z) - FusionBench: A Comprehensive Benchmark of Deep Model Fusion [78.80920533793595]
Deep model fusion is a technique that unifies the predictions or parameters of several deep neural networks into a single model.
FusionBench is the first comprehensive benchmark dedicated to deep model fusion.
arXiv Detail & Related papers (2024-06-05T13:54:28Z) - Multimodal Multi-loss Fusion Network for Sentiment Analysis [3.8611070161950902]
This paper investigates the optimal selection and fusion of feature encoders across multiple modalities to improve sentiment detection.
We compare different fusion methods and examine the impact of multi-loss training within the multi-modality fusion network.
We have found that integrating context significantly enhances model performance.
arXiv Detail & Related papers (2023-08-01T03:54:27Z) - Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.
We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process.
Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z) - Provable Dynamic Fusion for Low-Quality Multimodal Data [94.39538027450948]
Dynamic multimodal fusion emerges as a promising learning paradigm.
Despite its widespread use, theoretical justifications in this field are still notably lacking.
This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective.
A novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of classification accuracy and model robustness.
arXiv Detail & Related papers (2023-06-03T08:32:35Z) - IMF: Interactive Multimodal Fusion Model for Link Prediction [13.766345726697404]
We introduce a novel Interactive Multimodal Fusion (IMF) model to integrate knowledge from different modalities.
Our approach has been demonstrated to be effective through empirical evaluations on several real-world datasets.
arXiv Detail & Related papers (2023-03-20T01:20:02Z) - Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge
Graph Completion [112.27103169303184]
Multimodal Knowledge Graphs (MKGs) organize visual-text factual knowledge.
MKGformer can obtain SOTA performance on four datasets of multimodal link prediction, multimodal RE, and multimodal NER.
arXiv Detail & Related papers (2022-05-04T23:40:04Z) - Learning Deep Multimodal Feature Representation with Asymmetric
Multi-layer Fusion [63.72912507445662]
We propose a compact and effective framework to fuse multimodal features at multiple layers in a single network.
We verify that multimodal features can be learnt within a shared single network by merely maintaining modality-specific batch normalization layers in the encoder.
Secondly, we propose a bidirectional multi-layer fusion scheme, where multimodal features can be exploited progressively.
arXiv Detail & Related papers (2021-08-11T03:42:13Z) - Two Headed Dragons: Multimodal Fusion and Cross Modal Transactions [14.700807572189412]
We propose a novel transformer based fusion method for HSI and LiDAR modalities.
The model is composed of stacked auto encoders that harness the cross key-value pairs for HSI and LiDAR.
We test our model on Houston (Data Fusion Contest - 2013) and MUUFL Gulfport datasets and achieve competitive results.
arXiv Detail & Related papers (2021-07-24T11:33:37Z) - Deep Multimodal Fusion by Channel Exchanging [87.40768169300898]
This paper proposes a parameter-free multimodal fusion framework that dynamically exchanges channels between sub-networks of different modalities.
The validity of such exchanging process is also guaranteed by sharing convolutional filters yet keeping separate BN layers across modalities, which, as an add-on benefit, allows our multimodal architecture to be almost as compact as a unimodal network.
arXiv Detail & Related papers (2020-11-10T09:53:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.