Meta Fusion: A Unified Framework For Multimodality Fusion with Mutual Learning
- URL: http://arxiv.org/abs/2507.20089v1
- Date: Sun, 27 Jul 2025 00:50:29 GMT
- Title: Meta Fusion: A Unified Framework For Multimodality Fusion with Mutual Learning
- Authors: Ziyi Liang, Annie Qu, Babak Shahbaba,
- Abstract summary: We introduce Meta Fusion, a flexible and principled framework that unifies existing strategies as special cases.<n>Motivated by deep mutual learning and ensemble learning, Meta Fusion constructs a cohort of models based on various combinations of latent representations across modalities.<n>Our approach is model-agnostic in learning the latent representations, allowing it to flexibly adapt to the unique characteristics of each modality.
- Score: 1.5367554212163714
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Developing effective multimodal data fusion strategies has become increasingly essential for improving the predictive power of statistical machine learning methods across a wide range of applications, from autonomous driving to medical diagnosis. Traditional fusion methods, including early, intermediate, and late fusion, integrate data at different stages, each offering distinct advantages and limitations. In this paper, we introduce Meta Fusion, a flexible and principled framework that unifies these existing strategies as special cases. Motivated by deep mutual learning and ensemble learning, Meta Fusion constructs a cohort of models based on various combinations of latent representations across modalities, and further boosts predictive performance through soft information sharing within the cohort. Our approach is model-agnostic in learning the latent representations, allowing it to flexibly adapt to the unique characteristics of each modality. Theoretically, our soft information sharing mechanism reduces the generalization error. Empirically, Meta Fusion consistently outperforms conventional fusion strategies in extensive simulation studies. We further validate our approach on real-world applications, including Alzheimer's disease detection and neural decoding.
Related papers
- Cross-Modal Alignment via Variational Copula Modelling [54.25504956780864]
It is essential to develop multimodal learning methods to aggregate various information from multiple modalities.<n>Existing methods mainly rely on concatenation or the Kronecker product, oversimplifying the interaction structure between modalities.<n>We propose a novel copula-driven multimodal learning framework, which focuses on learning the joint distribution of various modalities.
arXiv Detail & Related papers (2025-11-05T05:28:28Z) - Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction [17.717216490402482]
We propose a novel multimodal learning framework that integrates enhanced modalities dropout and contrastive learning.<n>We validate our framework on large-scale clinical datasets for disease detection and prediction tasks.<n>Our findings highlight the effectiveness, efficiency, and generalizability of our approach for multimodal learning.
arXiv Detail & Related papers (2025-09-22T18:12:12Z) - Decouple, Reorganize, and Fuse: A Multimodal Framework for Cancer Survival Prediction [25.880454851313434]
Cancer survival analysis commonly integrates information across diverse medical modalities to make survival-time predictions.<n>Existing methods primarily focus on extracting different decoupled features of modalities and performing fusion operations such as concatenation, attention, and MoE-based fusion.<n>We propose a novel Decoupling-Reorganization-Fusion framework (DeReF), which devises a random feature reorganization strategy between modalities decoupling and dynamic MoE fusion modules.
arXiv Detail & Related papers (2025-08-26T03:18:25Z) - Deep Unrolled Meta-Learning for Multi-Coil and Multi-Modality MRI with Adaptive Optimization [0.0]
We propose a unified deep meta-learning framework for accelerated magnetic resonance imaging (MRI)<n>We jointly address multi-coil reconstruction and cross-modality synthesis.<n>Our results show significant improvements in PSNR and over conventional supervised learning.
arXiv Detail & Related papers (2025-05-08T04:47:12Z) - Harmony: A Unified Framework for Modality Incremental Learning [81.13765007314781]
This paper investigates the feasibility of developing a unified model capable of incremental learning across continuously evolving modal sequences.<n>We propose a novel framework named Harmony, designed to achieve modal alignment and knowledge retention.<n>Our approach introduces the adaptive compatible feature modulation and cumulative modal bridging.
arXiv Detail & Related papers (2025-04-17T06:35:01Z) - Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.<n>Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.<n>Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.<n>Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z) - Completed Feature Disentanglement Learning for Multimodal MRIs Analysis [36.32164729310868]
Feature disentanglement (FD)-based methods have achieved significant success in multimodal learning (MML)<n>We propose a novel Complete Feature Disentanglement (CFD) strategy that recovers the lost information during feature decoupling.<n>Specifically, the CFD strategy not only identifies modality-shared and modality-specific features, but also decouples shared features among subsets of multimodal inputs.
arXiv Detail & Related papers (2024-07-06T01:49:38Z) - Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology [8.802214988309684]
We introduce a hierarchical attention structure to leverage shared and complementary features of both modalities of histology and genomics.
Our method surpasses previous state-of-the-art approaches in glioma diagnosis and prognosis tasks.
arXiv Detail & Related papers (2024-06-11T09:06:41Z) - Advancing Multimodal Data Fusion in Pain Recognition: A Strategy Leveraging Statistical Correlation and Human-Centered Perspectives [0.3749861135832073]
This research presents a novel multimodal data fusion methodology for pain behavior recognition.
We introduce two key innovations: 1) integrating data-driven statistical relevance weights into the fusion strategy, and 2) incorporating human-centric movement characteristics into multimodal representation learning.
Our findings have significant implications for promoting patient-centered healthcare interventions and supporting explainable clinical decision-making.
arXiv Detail & Related papers (2024-03-30T11:13:18Z) - Joint Multimodal Transformer for Emotion Recognition in the Wild [49.735299182004404]
Multimodal emotion recognition (MMER) systems typically outperform unimodal systems.
This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention.
arXiv Detail & Related papers (2024-03-15T17:23:38Z) - Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning.
MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process.
It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities.
Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z) - Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.
We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process.
Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - IMF: Interactive Multimodal Fusion Model for Link Prediction [13.766345726697404]
We introduce a novel Interactive Multimodal Fusion (IMF) model to integrate knowledge from different modalities.
Our approach has been demonstrated to be effective through empirical evaluations on several real-world datasets.
arXiv Detail & Related papers (2023-03-20T01:20:02Z) - Attention Bottlenecks for Multimodal Fusion [90.75885715478054]
Machine perception models are typically modality-specific and optimised for unimodal benchmarks.
We introduce a novel transformer based architecture that uses fusion' for modality fusion at multiple layers.
We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks.
arXiv Detail & Related papers (2021-06-30T22:44:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.