Two Headed Dragons: Multimodal Fusion and Cross Modal Transactions
- URL: http://arxiv.org/abs/2107.11585v1
- Date: Sat, 24 Jul 2021 11:33:37 GMT
- Title: Two Headed Dragons: Multimodal Fusion and Cross Modal Transactions
- Authors: Rupak Bose, Shivam Pande, Biplab Banerjee
- Abstract summary: We propose a novel transformer based fusion method for HSI and LiDAR modalities.
The model is composed of stacked auto encoders that harness the cross key-value pairs for HSI and LiDAR.
We test our model on Houston (Data Fusion Contest - 2013) and MUUFL Gulfport datasets and achieve competitive results.
- Score: 14.700807572189412
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the field of remote sensing is evolving, we witness the accumulation of
information from several modalities, such as multispectral (MS), hyperspectral
(HSI), LiDAR etc. Each of these modalities possess its own distinct
characteristics and when combined synergistically, perform very well in the
recognition and classification tasks. However, fusing multiple modalities in
remote sensing is cumbersome due to highly disparate domains. Furthermore, the
existing methods do not facilitate cross-modal interactions. To this end, we
propose a novel transformer based fusion method for HSI and LiDAR modalities.
The model is composed of stacked auto encoders that harness the cross key-value
pairs for HSI and LiDAR, thus establishing a communication between the two
modalities, while simultaneously using the CNNs to extract the spectral and
spatial information from HSI and LiDAR. We test our model on Houston (Data
Fusion Contest - 2013) and MUUFL Gulfport datasets and achieve competitive
results.
Related papers
- Fusion-Mamba for Cross-modality Object Detection [63.56296480951342]
Cross-modality fusing information from different modalities effectively improves object detection performance.
We design a Fusion-Mamba block (FMB) to map cross-modal features into a hidden state space for interaction.
Our proposed approach outperforms the state-of-the-art methods on $m$AP with 5.9% on $M3FD$ and 4.9% on FLIR-Aligned datasets.
arXiv Detail & Related papers (2024-04-14T05:28:46Z) - FedDiff: Diffusion Model Driven Federated Learning for Multi-Modal and
Multi-Clients [32.59184269562571]
We propose a multi-modal collaborative diffusion federated learning framework called FedDiff.
Our framework establishes a dual-branch diffusion model feature extraction setup, where the two modal data are inputted into separate branches of the encoder.
Considering the challenge of private and efficient communication between multiple clients, we embed the diffusion model into the federated learning communication structure.
arXiv Detail & Related papers (2023-11-16T02:29:37Z) - Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.
We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process.
Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z) - Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data.
Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds.
We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Multimodal Hyperspectral Image Classification via Interconnected Fusion [12.41850641917384]
An Interconnected Fusion (IF) framework is proposed to explore the relationships across HSI and LiDAR modalities comprehensively.
Experiments have been conducted on three widely used datasets: Trento, MUUFL, and Houston.
arXiv Detail & Related papers (2023-04-02T09:46:13Z) - Weakly Aligned Feature Fusion for Multimodal Object Detection [52.15436349488198]
multimodal data often suffer from the position shift problem, i.e., the image pair is not strictly aligned.
This problem makes it difficult to fuse multimodal features and puzzles the convolutional neural network (CNN) training.
In this article, we propose a general multimodal detector named aligned region CNN (AR-CNN) to tackle the position shift problem.
arXiv Detail & Related papers (2022-04-21T02:35:23Z) - Cross-Modality Fusion Transformer for Multispectral Object Detection [0.0]
Multispectral image pairs can provide the combined information, making object detection applications more reliable and robust.
We present a simple yet effective cross-modality feature fusion approach, named Cross-Modality Fusion Transformer (CFT) in this paper.
arXiv Detail & Related papers (2021-10-30T15:34:12Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - A novel multimodal fusion network based on a joint coding model for lane
line segmentation [22.89466867866239]
We introduce a novel multimodal fusion architecture from an information theory perspective.
We demonstrate its practical utility using LiDAR camera fusion networks.
Our optimal fusion network achieves 85%+ lane line accuracy and 98.7%+ overall.
arXiv Detail & Related papers (2021-03-20T06:47:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.