Incomplete Multimodal Learning for Remote Sensing Data Fusion
- URL: http://arxiv.org/abs/2304.11381v1
- Date: Sat, 22 Apr 2023 12:16:52 GMT
- Title: Incomplete Multimodal Learning for Remote Sensing Data Fusion
- Authors: Yuxing Chen, Maofan Zhao, Lorenzo Bruzzone
- Abstract summary: The mechanism of connecting multimodal signals through self-attention operation is a key factor in the success of multimodal Transformer networks in remote sensing data fusion tasks.
Traditional approaches assume access to all modalities during both training and inference, which can lead to severe degradation when dealing with modal-incomplete inputs in downstream applications.
Our proposed approach introduces a novel model for incomplete multimodal learning in the context of remote sensing data fusion.
- Score: 12.822457129596824
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The mechanism of connecting multimodal signals through self-attention
operation is a key factor in the success of multimodal Transformer networks in
remote sensing data fusion tasks. However, traditional approaches assume access
to all modalities during both training and inference, which can lead to severe
degradation when dealing with modal-incomplete inputs in downstream
applications. To address this limitation, our proposed approach introduces a
novel model for incomplete multimodal learning in the context of remote sensing
data fusion. This approach can be used in both supervised and self-supervised
pretraining paradigms and leverages the additional learned fusion tokens in
combination with Bi-LSTM attention and masked self-attention mechanisms to
collect multimodal signals. The proposed approach employs reconstruction and
contrastive loss to facilitate fusion in pre-training while allowing for random
modality combinations as inputs in network training. Our approach delivers
state-of-the-art performance on two multimodal datasets for tasks such as
building instance / semantic segmentation and land-cover mapping tasks when
dealing with incomplete inputs during inference.
Related papers
- Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - One-stage Modality Distillation for Incomplete Multimodal Learning [7.791488931628906]
This paper presents a one-stage modality distillation framework that unifies the privileged knowledge transfer and modality information fusion.
The proposed framework can overcome the problem of incomplete modality input in various scenes and achieve state-of-the-art performance.
arXiv Detail & Related papers (2023-09-15T07:12:27Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data.
Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds.
We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z) - RGBT Tracking via Progressive Fusion Transformer with Dynamically Guided
Learning [37.067605349559]
We propose a novel Progressive Fusion Transformer called ProFormer.
It integrates single-modality information into the multimodal representation for robust RGBT tracking.
ProFormer sets a new state-of-the-art performance on RGBT210, RGBT234, LasHeR, and VTUAV datasets.
arXiv Detail & Related papers (2023-03-26T16:55:58Z) - Omni-Training for Data-Efficient Deep Learning [80.28715182095975]
Recent advances reveal that a properly pre-trained model endows an important property: transferability.
A tight combination of pre-training and meta-training cannot achieve both kinds of transferability.
This motivates the proposed Omni-Training framework towards data-efficient deep learning.
arXiv Detail & Related papers (2021-10-14T16:30:36Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - Multimodal Clustering Networks for Self-supervised Learning from
Unlabeled Videos [69.61522804742427]
This paper proposes a self-supervised training framework that learns a common multimodal embedding space.
We extend the concept of instance-level contrastive learning with a multimodal clustering step to capture semantic similarities across modalities.
The resulting embedding space enables retrieval of samples across all modalities, even from unseen datasets and different domains.
arXiv Detail & Related papers (2021-04-26T15:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.