TransFusionOdom: Interpretable Transformer-based LiDAR-Inertial Fusion
Odometry Estimation
- URL: http://arxiv.org/abs/2304.07728v2
- Date: Wed, 26 Apr 2023 00:44:25 GMT
- Title: TransFusionOdom: Interpretable Transformer-based LiDAR-Inertial Fusion
Odometry Estimation
- Authors: Leyuan Sun, Guanqun Ding, Yue Qiu, Yusuke Yoshiyasu and Fumio Kanehiro
- Abstract summary: We propose an end-to-end supervised Transformer-based LiDAR-Inertial fusion framework (namely TransFusionOdom) for odometry estimation.
We show different fusion approaches for homogeneous and heterogeneous modalities to address the overfitting problem.
exhaustive ablation studies evaluate different multi-modal fusion strategies to verify the performance of the proposed fusion strategy.
- Score: 7.778461949427663
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-modal fusion of sensors is a commonly used approach to enhance the
performance of odometry estimation, which is also a fundamental module for
mobile robots. However, the question of \textit{how to perform fusion among
different modalities in a supervised sensor fusion odometry estimation task?}
is still one of challenging issues remains. Some simple operations, such as
element-wise summation and concatenation, are not capable of assigning adaptive
attentional weights to incorporate different modalities efficiently, which make
it difficult to achieve competitive odometry results. Recently, the Transformer
architecture has shown potential for multi-modal fusion tasks, particularly in
the domains of vision with language. In this work, we propose an end-to-end
supervised Transformer-based LiDAR-Inertial fusion framework (namely
TransFusionOdom) for odometry estimation. The multi-attention fusion module
demonstrates different fusion approaches for homogeneous and heterogeneous
modalities to address the overfitting problem that can arise from blindly
increasing the complexity of the model. Additionally, to interpret the learning
process of the Transformer-based multi-modal interactions, a general
visualization approach is introduced to illustrate the interactions between
modalities. Moreover, exhaustive ablation studies evaluate different
multi-modal fusion strategies to verify the performance of the proposed fusion
strategy. A synthetic multi-modal dataset is made public to validate the
generalization ability of the proposed fusion strategy, which also works for
other combinations of different modalities. The quantitative and qualitative
odometry evaluations on the KITTI dataset verify the proposed TransFusionOdom
could achieve superior performance compared with other related works.
Related papers
- Gated Recursive Fusion: A Stateful Approach to Scalable Multimodal Transformers [0.0]
Gated Recurrent Fusion (GRF) is a novel architecture that captures the power of cross-modal attention within a linearly scalable, recurrent pipeline.<n>Our work presents a robust and efficient paradigm for powerful, scalable multimodal representation learning.
arXiv Detail & Related papers (2025-07-01T09:33:38Z) - COMO: Cross-Mamba Interaction and Offset-Guided Fusion for Multimodal Object Detection [9.913133285133998]
Single-modal object detection tasks often experience performance degradation when encountering diverse scenarios.
multimodal object detection tasks can offer more comprehensive information about object features by integrating data from various modalities.
In this paper, we propose a novel approach called the CrOss-Mamba interaction and Offset-guided fusion framework.
arXiv Detail & Related papers (2024-12-24T01:14:48Z) - Generalized Multimodal Fusion via Poisson-Nernst-Planck Equation [5.022049774600693]
This paper proposes a generalized multimodal fusion method (GMF) via the Poisson-Nernst-Planck (PNP) equation.
We show that the proposed GMF achieves close to the state-of-the-art (SOTA) accuracy while utilizing fewer parameters and computational resources.
arXiv Detail & Related papers (2024-10-20T19:15:28Z) - SeaDATE: Remedy Dual-Attention Transformer with Semantic Alignment via Contrast Learning for Multimodal Object Detection [18.090706979440334]
Multimodal object detection leverages diverse modal information to enhance the accuracy and robustness of detectors.
Current methods merely stack Transformer-guided fusion techniques without exploring their capability to extract features at various depth layers of network.
In this paper, we introduce an accurate and efficient object detection method named SeaDATE.
arXiv Detail & Related papers (2024-10-15T07:26:39Z) - Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD)
It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images.
A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z) - Coupled generator decomposition for fusion of electro- and magnetoencephalography data [1.7102695043811291]
Data fusion modeling can identify common features across diverse data sources while accounting for source-specific variability.
We introduce the concept of a textitcoupled generator decomposition and demonstrate how it generalizes sparse principal component analysis for data fusion.
arXiv Detail & Related papers (2024-03-02T12:09:16Z) - A Low-rank Matching Attention based Cross-modal Feature Fusion Method for Conversational Emotion Recognition [54.44337276044968]
We introduce a novel and lightweight cross-modal feature fusion method called Low-Rank Matching Attention Method (LMAM)
LMAM effectively captures contextual emotional semantic information in conversations while mitigating the quadratic complexity issue caused by the self-attention mechanism.
Experimental results verify the superiority of LMAM compared with other popular cross-modal fusion methods on the premise of being more lightweight.
arXiv Detail & Related papers (2023-06-16T16:02:44Z) - A Task-guided, Implicitly-searched and Meta-initialized Deep Model for
Image Fusion [69.10255211811007]
We present a Task-guided, Implicit-searched and Meta- generalizationd (TIM) deep model to address the image fusion problem in a challenging real-world scenario.
Specifically, we propose a constrained strategy to incorporate information from downstream tasks to guide the unsupervised learning process of image fusion.
Within this framework, we then design an implicit search scheme to automatically discover compact architectures for our fusion model with high efficiency.
arXiv Detail & Related papers (2023-05-25T08:54:08Z) - Equivariant Multi-Modality Image Fusion [124.11300001864579]
We propose the Equivariant Multi-Modality imAge fusion paradigm for end-to-end self-supervised learning.
Our approach is rooted in the prior knowledge that natural imaging responses are equivariant to certain transformations.
Experiments confirm that EMMA yields high-quality fusion results for infrared-visible and medical images.
arXiv Detail & Related papers (2023-05-19T05:50:24Z) - MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality
Hybrid [40.745848169903105]
Multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs.
MMEA algorithms rely on KG-level modality fusion strategies for multi-modal entity representation.
This paper introduces MEAformer, a multi-modal entity alignment transformer approach for meta modality hybrid.
arXiv Detail & Related papers (2022-12-29T20:49:58Z) - Relational Reasoning via Set Transformers: Provable Efficiency and
Applications to MARL [154.13105285663656]
A cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications.
Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works.
We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents.
arXiv Detail & Related papers (2022-09-20T16:42:59Z) - Multimodal Token Fusion for Vision Transformers [54.81107795090239]
We propose a multimodal token fusion method (TokenFusion) for transformer-based vision tasks.
To effectively fuse multiple modalities, TokenFusion dynamically detects uninformative tokens and substitutes these tokens with projected and aggregated inter-modal features.
The design of TokenFusion allows the transformer to learn correlations among multimodal features, while the single-modal transformer architecture remains largely intact.
arXiv Detail & Related papers (2022-04-19T07:47:50Z) - Cross-Modality Fusion Transformer for Multispectral Object Detection [0.0]
Multispectral image pairs can provide the combined information, making object detection applications more reliable and robust.
We present a simple yet effective cross-modality feature fusion approach, named Cross-Modality Fusion Transformer (CFT) in this paper.
arXiv Detail & Related papers (2021-10-30T15:34:12Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.