Related papers: Multi-Modal One-Shot Federated Ensemble Learning for Medical Data with Vision Large Language Model

Multi-Modal One-Shot Federated Ensemble Learning for Medical Data with Vision Large Language Model

URL: http://arxiv.org/abs/2501.03292v1
Date: Mon, 06 Jan 2025 08:36:28 GMT
Title: Multi-Modal One-Shot Federated Ensemble Learning for Medical Data with Vision Large Language Model
Authors: Naibo Wang, Yuchen Deng, Shichen Fan, Jianwei Yin, See-Kiong Ng,
Abstract summary: We introduce FedMME, an innovative one-shot multi-modal federated ensemble learning framework.<n>FedMME capitalizes on vision large language models to produce textual reports from medical images.<n>It surpasses existing one-shot federated learning approaches by more than 17.5% in accuracy on the RSNA dataset.
Score: 27.299068494473016
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Federated learning (FL) has attracted considerable interest in the medical domain due to its capacity to facilitate collaborative model training while maintaining data privacy. However, conventional FL methods typically necessitate multiple communication rounds, leading to significant communication overhead and delays, especially in environments with limited bandwidth. One-shot federated learning addresses these issues by conducting model training and aggregation in a single communication round, thereby reducing communication costs while preserving privacy. Among these, one-shot federated ensemble learning combines independently trained client models using ensemble techniques such as voting, further boosting performance in non-IID data scenarios. On the other hand, existing machine learning methods in healthcare predominantly use unimodal data (e.g., medical images or textual reports), which restricts their diagnostic accuracy and comprehensiveness. Therefore, the integration of multi-modal data is proposed to address these shortcomings. In this paper, we introduce FedMME, an innovative one-shot multi-modal federated ensemble learning framework that utilizes multi-modal data for medical image analysis. Specifically, FedMME capitalizes on vision large language models to produce textual reports from medical images, employs a BERT model to extract textual features from these reports, and amalgamates these features with visual features to improve diagnostic accuracy. Experimental results show that our method demonstrated superior performance compared to existing one-shot federated learning methods in healthcare scenarios across four datasets with various data distributions. For instance, it surpasses existing one-shot federated learning approaches by more than 17.5% in accuracy on the RSNA dataset when applying a Dirichlet distribution with ($\alpha$ = 0.3).

Related papers

CLIP-IT: CLIP-based Pairing for Histology Images Classification [6.855390956571216]
We introduce CLIP-IT to train a vision backbone model to classify histology images by pairing them with privileged textual information from an external source. At first, the modality pairing step relies on a CLIP-based model to match histology images with semantically relevant textual report data from external sources, creating an augmented multimodal dataset. A parameter-efficient fine-tuning method is used to efficiently address the misalignment between the main (image) and paired (text) modalities.
arXiv Detail & Related papers (2025-04-22T18:14:43Z)
UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities [68.12889379702824]
Vision-Language Models (VLMs) trained via contrastive learning have achieved notable success in natural image tasks. UniMed is a large-scale, open-source multi-modal medical dataset comprising over 5.3 million image-text pairs. We trained UniMed-CLIP, a unified VLM for six modalities, achieving notable gains in zero-shot evaluations.
arXiv Detail & Related papers (2024-12-13T18:59:40Z)
FACMIC: Federated Adaptative CLIP Model for Medical Image Classification [12.166024140377337]
We introduce a federated adaptive Contrastive Language Image Pretraining CLIP model for classification tasks. We employ a light-weight and efficient feature attention module for CLIP that selects suitable features for each client's data. We propose a domain adaptation technique to reduce differences in data distribution between clients.
arXiv Detail & Related papers (2024-10-08T13:24:10Z)
FedMM: Federated Multi-Modal Learning with Modality Heterogeneity in Computational Pathology [3.802258033231335]
Federated Multi-Modal (FedMM) is a learning framework that trains multiple single-modal feature extractors to enhance subsequent classification performance. FedMM notably outperforms two baselines in accuracy and AUC metrics.
arXiv Detail & Related papers (2024-02-24T16:58:42Z)
A Distributed Privacy Preserving Model for the Detection of Alzheimer's Disease [0.0]
This paper introduces a HIPAA compliant framework that can train from distributed data. I then propose a multimodal vertical federated model for Alzheimer's Disease (AD) detection. The VFL architecture proposed herein offers a novel distributed architecture, enabling collaborative learning across diverse sources of medical data.
arXiv Detail & Related papers (2023-12-15T22:09:04Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach. Our approach is easy to integrate into any hybrid model and requires no external training data. Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z)
Collaborative Training of Medical Artificial Intelligence Models with non-uniform Labels [0.07176066267895696]
Building powerful and robust deep learning models requires training with large multi-party datasets. We propose flexible federated learning (FFL) for collaborative training on such data. We demonstrate that having heterogeneously labeled datasets, FFL-based training leads to significant performance increase.
arXiv Detail & Related papers (2022-11-24T13:48:54Z)
Decentralized Distributed Learning with Privacy-Preserving Data Synthesis [9.276097219140073]
In the medical field, multi-center collaborations are often sought to yield more generalizable findings by leveraging the heterogeneity of patient and clinical data. Recent privacy regulations hinder the possibility to share data, and consequently, to come up with machine learning-based solutions that support diagnosis and prognosis. We present a decentralized distributed method that integrates features from local nodes, providing models able to generalize across multiple datasets while maintaining privacy.
arXiv Detail & Related papers (2022-06-20T23:49:38Z)
Practical Challenges in Differentially-Private Federated Survival Analysis of Medical Data [57.19441629270029]
In this paper, we take advantage of the inherent properties of neural networks to federate the process of training of survival analysis models. In the realistic setting of small medical datasets and only a few data centers, this noise makes it harder for the models to converge. We propose DPFed-post which adds a post-processing stage to the private federated learning scheme.
arXiv Detail & Related papers (2022-02-08T10:03:24Z)
Multi-modal AsynDGAN: Learn From Distributed Medical Image Data without Sharing Private Information [55.866673486753115]
We propose an extendable and elastic learning framework to preserve privacy and security. The proposed framework is named distributed Asynchronized Discriminator Generative Adrial Networks (AsynDGAN)
arXiv Detail & Related papers (2020-12-15T20:41:24Z)
Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities. This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time. We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv Detail & Related papers (2020-10-20T20:05:35Z)
Multi-site fMRI Analysis Using Privacy-preserving Federated Learning and Domain Adaptation: ABIDE Results [13.615292855384729]
To train a high-quality deep learning model, the aggregation of a significant amount of patient information is required. Due to the need to protect the privacy of patient data, it is hard to assemble a central database from multiple institutions. Federated learning allows for population-level models to be trained without centralizing entities' data.
arXiv Detail & Related papers (2020-01-16T04:49:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.