MedLeak: Multimodal Medical Data Leakage in Secure Federated Learning with Crafted Models
- URL: http://arxiv.org/abs/2407.09972v2
- Date: Sun, 29 Jun 2025 15:48:58 GMT
- Title: MedLeak: Multimodal Medical Data Leakage in Secure Federated Learning with Crafted Models
- Authors: Shanghao Shi, Md Shahedul Haque, Abhijeet Parida, Chaoyu Zhang, Marius George Linguraru, Y. Thomas Hou, Syed Muhammad Anwar, Wenjing Lou,
- Abstract summary: Federated learning (FL) allows participants to collaboratively train machine learning models while keeping their data local.<n>We propose a novel privacy attack called MedLeak, which allows a malicious FL server to recover high-quality site-specific private medical data.
- Score: 20.884070284666105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated learning (FL) allows participants to collaboratively train machine learning models while keeping their data local, making it ideal for collaborations among healthcare institutions on sensitive data. However, in this paper, we propose a novel privacy attack called MedLeak, which allows a malicious FL server to recover high-quality site-specific private medical data from the client model updates. MedLeak works by introducing an adversarially crafted model during the FL training process. Honest clients, unaware of the insidious changes in the published models, continue to send back their updates as per the standard FL protocol. Leveraging a novel analytical method, MedLeak can efficiently recover private client data from the aggregated parameter updates, eliminating costly optimization. In addition, the scheme relies solely on the aggregated updates, thus rendering secure aggregation protocols ineffective, as they depend on the randomization of intermediate results for security while leaving the final aggregated results unaltered. We implement MedLeak on medical image datasets (MedMNIST, COVIDx CXR-4, and Kaggle Brain Tumor MRI), as well as a medical text dataset (MedAbstract). The results demonstrate that our attack achieves high recovery rates and strong quantitative scores on both image and text datasets. We also thoroughly evaluate MedLeak across different attack parameters, providing insights into key factors that influence attack performance and potential defenses. Furthermore, we demonstrate that the recovered data can support downstream tasks such as disease classification with minimal performance loss. Our findings validate the need for enhanced privacy measures in FL systems, particularly for safeguarding sensitive medical data against powerful model inversion attacks.
Related papers
- Forget-MI: Machine Unlearning for Forgetting Multimodal Information in Healthcare Settings [5.200386658850142]
Forget-MI is a novel machine unlearning method for multimodal medical data.<n>We evaluate our results using performance on the forget dataset, performance on the test dataset, and Membership Inference Attack (MIA)<n>Our approach reduces MIA by 0.202 and decreases AUC and F1 scores on the forget set by 0.221 and 0.305, respectively.
arXiv Detail & Related papers (2025-06-29T08:53:23Z) - PatientDx: Merging Large Language Models for Protecting Data-Privacy in Healthcare [2.1046377530356764]
Fine-tuning of Large Language Models (LLMs) has become the default practice for improving model performance on a given task.<n>PatientDx is a framework of model merging that allows the design of effective LLMs for health-predictive tasks without requiring fine-tuning nor adaptation on patient data.
arXiv Detail & Related papers (2025-04-24T08:21:04Z) - FedMetaMed: Federated Meta-Learning for Personalized Medication in Distributed Healthcare Systems [7.32609591220333]
We introduce Federated Meta-Learning for Personalized Medication (FedMetaMed)<n>FedMetaMed combines federated learning and meta-learning to create models that adapt to diverse patient data across healthcare systems.<n>We show that FedMetaMed outperforms state-of-the-art FL methods, showing superior generalization even on out-of-the-art cohorts.
arXiv Detail & Related papers (2024-12-05T03:36:55Z) - FACMIC: Federated Adaptative CLIP Model for Medical Image Classification [12.166024140377337]
We introduce a federated adaptive Contrastive Language Image Pretraining CLIP model for classification tasks.
We employ a light-weight and efficient feature attention module for CLIP that selects suitable features for each client's data.
We propose a domain adaptation technique to reduce differences in data distribution between clients.
arXiv Detail & Related papers (2024-10-08T13:24:10Z) - BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning [71.60858267608306]
Medical foundation models are susceptible to backdoor attacks.
This work introduces a method to embed a backdoor into the medical foundation model during the prompt learning phase.
Our method, BAPLe, requires only a minimal subset of data to adjust the noise trigger and the text prompts for downstream tasks.
arXiv Detail & Related papers (2024-08-14T10:18:42Z) - FreqFed: A Frequency Analysis-Based Approach for Mitigating Poisoning
Attacks in Federated Learning [98.43475653490219]
Federated learning (FL) is susceptible to poisoning attacks.
FreqFed is a novel aggregation mechanism that transforms the model updates into the frequency domain.
We demonstrate that FreqFed can mitigate poisoning attacks effectively with a negligible impact on the utility of the aggregated model.
arXiv Detail & Related papers (2023-12-07T16:56:24Z) - Histopathological Image Classification and Vulnerability Analysis using
Federated Learning [1.104960878651584]
A global model sends its copy to all clients who train these copies, and the clients send the updates (weights) back to it.
Data privacy is protected during training, as it is conducted locally on the clients' devices.
However, the global model is susceptible to data poisoning attacks.
arXiv Detail & Related papers (2023-10-11T10:55:14Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Client-side Gradient Inversion Against Federated Learning from Poisoning [59.74484221875662]
Federated Learning (FL) enables distributed participants to train a global model without sharing data directly to a central server.
Recent studies have revealed that FL is vulnerable to gradient inversion attack (GIA), which aims to reconstruct the original training samples.
We propose Client-side poisoning Gradient Inversion (CGI), which is a novel attack method that can be launched from clients.
arXiv Detail & Related papers (2023-09-14T03:48:27Z) - Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites:
A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area.
We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions.
We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z) - Federated Learning of Medical Concepts Embedding using BEHRT [0.0]
We propose a federated learning approach for learning medical concepts embedding.
Our approach is based on embedding model like BEHRT, a deep neural sequence model for EHR.
We compare the performance of a model trained with FL against a model trained on centralized data.
arXiv Detail & Related papers (2023-05-22T14:05:39Z) - Client-specific Property Inference against Secure Aggregation in
Federated Learning [52.8564467292226]
Federated learning has become a widely used paradigm for collaboratively training a common model among different participants.
Many attacks have shown that it is still possible to infer sensitive information such as membership, property, or outright reconstruction of participant data.
We show that simple linear models can effectively capture client-specific properties only from the aggregated model updates.
arXiv Detail & Related papers (2023-03-07T14:11:01Z) - CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated
Learning [77.27443885999404]
Federated Learning (FL) is a setting for training machine learning models in distributed environments.
We propose a novel method, CANIFE, that uses carefully crafted samples by a strong adversary to evaluate the empirical privacy of a training round.
arXiv Detail & Related papers (2022-10-06T13:30:16Z) - Suppressing Poisoning Attacks on Federated Learning for Medical Imaging [4.433842217026879]
We propose a robust aggregation rule called Distance-based Outlier Suppression (DOS) that is resilient to byzantine failures.
The proposed method computes the distance between local parameter updates of different clients and obtains an outlier score for each client.
The resulting outlier scores are converted into normalized weights using a softmax function, and a weighted average of the local parameters is used for updating the global model.
arXiv Detail & Related papers (2022-07-15T00:43:34Z) - Do Gradient Inversion Attacks Make Federated Learning Unsafe? [70.0231254112197]
Federated learning (FL) allows the collaborative training of AI models without needing to share raw data.
Recent works on the inversion of deep neural networks from model gradients raised concerns about the security of FL in preventing the leakage of training data.
In this work, we show that these attacks presented in the literature are impractical in real FL use-cases and provide a new baseline attack.
arXiv Detail & Related papers (2022-02-14T18:33:12Z) - Personalized Retrogress-Resilient Framework for Real-World Medical
Federated Learning [8.240098954377794]
We propose a personalized retrogress-resilient framework to produce a superior personalized model for each client.
Our experiments on real-world dermoscopic FL dataset prove that our personalized retrogress-resilient framework outperforms state-of-the-art FL methods.
arXiv Detail & Related papers (2021-10-01T13:24:29Z) - Differentially private federated deep learning for multi-site medical
image segmentation [56.30543374146002]
Collaborative machine learning techniques such as federated learning (FL) enable the training of models on effectively larger datasets without data transfer.
Recent initiatives have demonstrated that segmentation models trained with FL can achieve performance similar to locally trained models.
However, FL is not a fully privacy-preserving technique and privacy-centred attacks can disclose confidential patient data.
arXiv Detail & Related papers (2021-07-06T12:57:32Z) - FLOP: Federated Learning on Medical Datasets using Partial Networks [84.54663831520853]
COVID-19 Disease due to the novel coronavirus has caused a shortage of medical resources.
Different data-driven deep learning models have been developed to mitigate the diagnosis of COVID-19.
The data itself is still scarce due to patient privacy concerns.
We propose a simple yet effective algorithm, named textbfFederated textbfL textbfon Medical datasets using textbfPartial Networks (FLOP)
arXiv Detail & Related papers (2021-02-10T01:56:58Z) - Privacy-preserving medical image analysis [53.4844489668116]
We present PriMIA, a software framework designed for privacy-preserving machine learning (PPML) in medical imaging.
We show significantly better classification performance of a securely aggregated federated learning model compared to human experts on unseen datasets.
We empirically evaluate the framework's security against a gradient-based model inversion attack.
arXiv Detail & Related papers (2020-12-10T13:56:00Z) - Dynamic Fusion based Federated Learning for COVID-19 Detection [24.644484914824844]
We propose a novel dynamic fusion-based federated learning approach for medical diagnostic image analysis to detect COVID-19 infections.
We present a dynamic fusion method to dynamically decide the participating clients according to their local model performance and schedule the model fusion-based on participating clients' training time.
The evaluation results show that the proposed approach is feasible and performs better than the default setting of federated learning.
arXiv Detail & Related papers (2020-09-22T09:09:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.