Context-Based Multimodal Fusion
- URL: http://arxiv.org/abs/2403.04650v2
- Date: Fri, 8 Mar 2024 14:29:41 GMT
- Title: Context-Based Multimodal Fusion
- Authors: Bilal Faye, Hanane Azzag, Mustapha Lebbah, Djamel Bouchaffra
- Abstract summary: We propose an innovative model called Context-Based Multimodal Fusion (CBMF)
CBMF combines modality fusion and data distribution alignment.
CBMF enables the use of large pre-trained models that can be frozen.
- Score: 0.08192907805418585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The fusion models, which effectively combine information from different
sources, are widely used in solving multimodal tasks. However, they have
significant limitations related to aligning data distributions across different
modalities. This challenge can lead to inconsistencies and difficulties in
learning robust representations. Alignment models, while specifically
addressing this issue, often require training "from scratch" with large
datasets to achieve optimal results, which can be costly in terms of resources
and time. To overcome these limitations, we propose an innovative model called
Context-Based Multimodal Fusion (CBMF), which combines both modality fusion and
data distribution alignment. In CBMF, each modality is represented by a
specific context vector, fused with the embedding of each modality. This
enables the use of large pre-trained models that can be frozen, reducing the
computational and training data requirements. Additionally, the network learns
to differentiate embeddings of different modalities through fusion with context
and aligns data distributions using a contrastive approach for self-supervised
learning. Thus, CBMF offers an effective and economical solution for solving
complex multimodal tasks.
Related papers
- Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models [6.610033827647869]
In real-world scenarios, consistently acquiring complete multimodal data presents significant challenges.
This often leads to the issue of missing modalities, where data for certain modalities are absent.
We propose a novel framework integrating parameter-efficient fine-tuning of unimodal pretrained models with a self-supervised joint-embedding learning method.
arXiv Detail & Related papers (2024-07-17T14:44:25Z) - U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild.
NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities.
We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z) - FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion [29.130355774088205]
FuseMoE is a mixture-of-experts framework incorporated with an innovative gating function.
Designed to integrate a diverse number of modalities, FuseMoE is effective in managing scenarios with missing modalities and irregularly sampled data trajectories.
arXiv Detail & Related papers (2024-02-05T17:37:46Z) - Cross-Modal Prototype based Multimodal Federated Learning under Severely
Missing Modality [31.727012729846333]
Multimodal Federated Cross Prototype Learning (MFCPL) is a novel approach for MFL under severely missing modalities.
MFCPL provides diverse modality knowledge in modality-shared level with the cross-modal regularization and modality-specific level with cross-modal contrastive mechanism.
Our approach introduces the cross-modal alignment to provide regularization for modality-specific features, thereby enhancing overall performance.
arXiv Detail & Related papers (2024-01-25T02:25:23Z) - SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal
Targets [30.262094419776208]
Current approaches assume that the source data is available during adaptation and that the source consists of paired multi-modal data.
We propose a switching framework which automatically chooses between two complementary methods of cross-modal pseudo-label fusion.
Our method achieves an improvement in mIoU of up to 12% over competing baselines.
arXiv Detail & Related papers (2023-08-23T02:57:58Z) - Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.
We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process.
Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - Relating by Contrasting: A Data-efficient Framework for Multimodal
Generative Models [86.9292779620645]
We develop a contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data.
Under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
arXiv Detail & Related papers (2020-07-02T15:08:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.