Multimodal Federated Learning with Missing Modality via Prototype Mask
and Contrast
- URL: http://arxiv.org/abs/2312.13508v2
- Date: Sun, 4 Feb 2024 07:44:56 GMT
- Title: Multimodal Federated Learning with Missing Modality via Prototype Mask
and Contrast
- Authors: Guangyin Bao, Qi Zhang, Duoqian Miao, Zixuan Gong, Liang Hu, Ke Liu,
Yang Liu, Chongyang Shi
- Abstract summary: In this paper, we introduce a prototype library into the FedAvg-based Federated Learning framework.
The proposed method utilizes prototypes as masks representing missing modalities to formulate a task-calibrated training loss and a model-agnostic uni-modality inference strategy.
Compared to the baselines, our method improved inference accuracy by 3.7% with 50% modality missing during training and by 23.8% during uni-modality inference.
- Score: 23.936677199734213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In real-world scenarios, multimodal federated learning often faces the
practical challenge of intricate modality missing, which poses constraints on
building federated frameworks and significantly degrades model inference
accuracy. Existing solutions for addressing missing modalities generally
involve developing modality-specific encoders on clients and training modality
fusion modules on servers. However, these methods are primarily constrained to
specific scenarios with either unimodal clients or complete multimodal clients,
struggling to generalize effectively in the intricate modality missing
scenarios. In this paper, we introduce a prototype library into the
FedAvg-based Federated Learning framework, thereby empowering the framework
with the capability to alleviate the global model performance degradation
resulting from modality missing during both training and testing. The proposed
method utilizes prototypes as masks representing missing modalities to
formulate a task-calibrated training loss and a model-agnostic uni-modality
inference strategy. In addition, a proximal term based on prototypes is
constructed to enhance local training. Experimental results demonstrate the
state-of-the-art performance of our approach. Compared to the baselines, our
method improved inference accuracy by 3.7\% with 50\% modality missing during
training and by 23.8\% during uni-modality inference. Code is available at
https://github.com/BaoGuangYin/PmcmFL.
Related papers
- Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models [6.610033827647869]
In real-world scenarios, consistently acquiring complete multimodal data presents significant challenges.
This often leads to the issue of missing modalities, where data for certain modalities are absent.
We propose a novel framework integrating parameter-efficient fine-tuning of unimodal pretrained models with a self-supervised joint-embedding learning method.
arXiv Detail & Related papers (2024-07-17T14:44:25Z) - Leveraging Foundation Models for Multi-modal Federated Learning with Incomplete Modality [41.79433449873368]
We propose a novel multi-modal federated learning method, Federated Multi-modal contrastiVe training with Pre-trained completion (FedMVP)
FedMVP integrates the large-scale pre-trained models to enhance the federated training.
We demonstrate that the model achieves superior performance over two real-world image-text classification datasets.
arXiv Detail & Related papers (2024-06-16T19:18:06Z) - Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization [14.606035444283984]
Current approaches focus on developing models that handle modality-incomplete inputs during inference.
We propose a robust universal model with modality reconstruction and model personalization.
Our method has been extensively validated on two brain tumor segmentation benchmarks.
arXiv Detail & Related papers (2024-06-04T06:07:24Z) - Combating Missing Modalities in Egocentric Videos at Test Time [92.38662956154256]
Real-world applications often face challenges with incomplete modalities due to privacy concerns, efficiency needs, or hardware issues.
We propose a novel approach to address this issue at test time without requiring retraining.
MiDl represents the first self-supervised, online solution for handling missing modalities exclusively at test time.
arXiv Detail & Related papers (2024-04-23T16:01:33Z) - Cross-Modal Prototype based Multimodal Federated Learning under Severely
Missing Modality [31.727012729846333]
Multimodal Federated Cross Prototype Learning (MFCPL) is a novel approach for MFL under severely missing modalities.
MFCPL provides diverse modality knowledge in modality-shared level with the cross-modal regularization and modality-specific level with cross-modal contrastive mechanism.
Our approach introduces the cross-modal alignment to provide regularization for modality-specific features, thereby enhancing overall performance.
arXiv Detail & Related papers (2024-01-25T02:25:23Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Towards Good Practices for Missing Modality Robust Action Recognition [20.26021126604409]
This paper seeks a set of good practices for multi-modal action recognition.
We study how to effectively regularize the model during training.
Second, we investigate on fusion methods for robustness to missing modalities.
Third, we propose a simple modular network, ActionMAE, which learns missing modality predictive coding.
arXiv Detail & Related papers (2022-11-25T06:10:57Z) - Exploiting modality-invariant feature for robust multimodal emotion
recognition with missing modalities [76.08541852988536]
We propose to use invariant features for a missing modality imagination network (IF-MMIN)
We show that the proposed model outperforms all baselines and invariantly improves the overall emotion recognition performance under uncertain missing-modality conditions.
arXiv Detail & Related papers (2022-10-27T12:16:25Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.