Cross-Modal Prototype based Multimodal Federated Learning under Severely
Missing Modality
- URL: http://arxiv.org/abs/2401.13898v1
- Date: Thu, 25 Jan 2024 02:25:23 GMT
- Title: Cross-Modal Prototype based Multimodal Federated Learning under Severely
Missing Modality
- Authors: Huy Q. Le, Chu Myaet Thwal, Yu Qiao, Ye Lin Tun, Minh N. H. Nguyen and
Choong Seon Hong
- Abstract summary: Multimodal Federated Cross Prototype Learning (MFCPL) is a novel approach for MFL under severely missing modalities.
MFCPL provides diverse modality knowledge in modality-shared level with the cross-modal regularization and modality-specific level with cross-modal contrastive mechanism.
Our approach introduces the cross-modal alignment to provide regularization for modality-specific features, thereby enhancing overall performance.
- Score: 31.727012729846333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal federated learning (MFL) has emerged as a decentralized machine
learning paradigm, allowing multiple clients with different modalities to
collaborate on training a machine learning model across diverse data sources
without sharing their private data. However, challenges, such as data
heterogeneity and severely missing modalities, pose crucial hindrances to the
robustness of MFL, significantly impacting the performance of global model. The
absence of a modality introduces misalignment during the local training phase,
stemming from zero-filling in the case of clients with missing modalities.
Consequently, achieving robust generalization in global model becomes
imperative, especially when dealing with clients that have incomplete data. In
this paper, we propose Multimodal Federated Cross Prototype Learning (MFCPL), a
novel approach for MFL under severely missing modalities by conducting the
complete prototypes to provide diverse modality knowledge in modality-shared
level with the cross-modal regularization and modality-specific level with
cross-modal contrastive mechanism. Additionally, our approach introduces the
cross-modal alignment to provide regularization for modality-specific features,
thereby enhancing overall performance, particularly in scenarios involving
severely missing modalities. Through extensive experiments on three multimodal
datasets, we demonstrate the effectiveness of MFCPL in mitigating these
challenges and improving the overall performance.
Related papers
- Leveraging Foundation Models for Multi-modal Federated Learning with Incomplete Modality [41.79433449873368]
We propose a novel multi-modal federated learning method, Federated Multi-modal contrastiVe training with Pre-trained completion (FedMVP)
FedMVP integrates the large-scale pre-trained models to enhance the federated training.
We demonstrate that the model achieves superior performance over two real-world image-text classification datasets.
arXiv Detail & Related papers (2024-06-16T19:18:06Z) - U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities.
Specifically, it features modality-specific encoders with connectors for a unified multimodal representation.
We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z) - Balanced Multi-modal Federated Learning via Cross-Modal Infiltration [19.513099949266156]
Federated learning (FL) underpins advancements in privacy-preserving distributed computing.
We propose a novel Cross-Modal Infiltration Federated Learning (FedCMI) framework.
arXiv Detail & Related papers (2023-12-31T05:50:15Z) - Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning.
MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process.
It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities.
Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Unimodal Training-Multimodal Prediction: Cross-modal Federated Learning
with Hierarchical Aggregation [16.308470947384134]
HA-Fedformer is a novel transformer-based model that empowers unimodal training with only a unimodal dataset at the client.
We develop an uncertainty-aware aggregation method for the local encoders with layer-wise Markov Chain Monte Carlo sampling.
Our experiments on popular sentiment analysis benchmarks, CMU-MOSI and CMU-MOSEI, demonstrate that HA-Fedformer significantly outperforms state-of-the-art multimodal models.
arXiv Detail & Related papers (2023-03-27T07:07:33Z) - Multimodal Federated Learning via Contrastive Representation Ensemble [17.08211358391482]
Federated learning (FL) serves as a privacy-conscious alternative to centralized machine learning.
Existing FL methods all rely on model aggregation on single modality level.
We propose Contrastive Representation Ensemble and Aggregation for Multimodal FL (CreamFL)
arXiv Detail & Related papers (2023-02-17T14:17:44Z) - Efficient Multimodal Transformer with Dual-Level Feature Restoration for
Robust Multimodal Sentiment Analysis [47.29528724322795]
Multimodal Sentiment Analysis (MSA) has attracted increasing attention recently.
Despite significant progress, there are still two major challenges on the way towards robust MSA.
We propose a generic and unified framework to address them, named Efficient Multimodal Transformer with Dual-Level Feature Restoration (EMT-DLFR)
arXiv Detail & Related papers (2022-08-16T08:02:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.