Exploratory Analysis of Federated Learning Methods with Differential
Privacy on MIMIC-III
- URL: http://arxiv.org/abs/2302.04208v1
- Date: Wed, 8 Feb 2023 17:27:44 GMT
- Title: Exploratory Analysis of Federated Learning Methods with Differential
Privacy on MIMIC-III
- Authors: Aron N. Horvath, Matteo Berchier, Farhad Nooralahzadeh, Ahmed Allam,
Michael Krauthammer
- Abstract summary: Federated learning methods offer the possibility of training machine learning models on privacy-sensitive data sets.
We present an evaluation of the impact of different federation and differential privacy techniques when training models on the open-source MIMIC-III dataset.
- Score: 0.7349727826230862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Background: Federated learning methods offer the possibility of training
machine learning models on privacy-sensitive data sets, which cannot be easily
shared. Multiple regulations pose strict requirements on the storage and usage
of healthcare data, leading to data being in silos (i.e. locked-in at
healthcare facilities). The application of federated algorithms on these
datasets could accelerate disease diagnostic, drug development, as well as
improve patient care.
Methods: We present an extensive evaluation of the impact of different
federation and differential privacy techniques when training models on the
open-source MIMIC-III dataset. We analyze a set of parameters influencing a
federated model performance, namely data distribution (homogeneous and
heterogeneous), communication strategies (communication rounds vs. local
training epochs), federation strategies (FedAvg vs. FedProx). Furthermore, we
assess and compare two differential privacy (DP) techniques during model
training: a stochastic gradient descent-based differential privacy algorithm
(DP-SGD), and a sparse vector differential privacy technique (DP-SVT).
Results: Our experiments show that extreme data distributions across sites
(imbalance either in the number of patients or the positive label ratios
between sites) lead to a deterioration of model performance when trained using
the FedAvg strategy. This issue is resolved when using FedProx with the use of
appropriate hyperparameter tuning. Furthermore, the results show that both
differential privacy techniques can reach model performances similar to those
of models trained without DP, however at the expense of a large quantifiable
privacy leakage.
Conclusions: We evaluate empirically the benefits of two federation
strategies and propose optimal strategies for the choice of parameters when
using differential privacy techniques.
Related papers
- Enhancing Performance for Highly Imbalanced Medical Data via Data Regularization in a Federated Learning Setting [6.22153888560487]
The goal of the proposed method is to enhance model performance for cardiovascular disease prediction.
The method is evaluated across four datasets for cardiovascular disease prediction, which are scattered across different clients.
arXiv Detail & Related papers (2024-05-30T19:15:38Z) - Communication-Efficient Hybrid Federated Learning for E-health with Horizontal and Vertical Data Partitioning [67.49221252724229]
E-health allows smart devices and medical institutions to collaboratively collect patients' data, which is trained by Artificial Intelligence (AI) technologies to help doctors make diagnosis.
Applying federated learning in e-health faces many challenges.
Medical data is both horizontally and vertically partitioned.
A naive combination of HFL and VFL has limitations including low training efficiency, unsound convergence analysis, and lack of parameter tuning strategies.
arXiv Detail & Related papers (2024-04-15T19:45:07Z) - Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites:
A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area.
We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions.
We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z) - Decentralized Distributed Learning with Privacy-Preserving Data
Synthesis [9.276097219140073]
In the medical field, multi-center collaborations are often sought to yield more generalizable findings by leveraging the heterogeneity of patient and clinical data.
Recent privacy regulations hinder the possibility to share data, and consequently, to come up with machine learning-based solutions that support diagnosis and prognosis.
We present a decentralized distributed method that integrates features from local nodes, providing models able to generalize across multiple datasets while maintaining privacy.
arXiv Detail & Related papers (2022-06-20T23:49:38Z) - Federated Offline Reinforcement Learning [55.326673977320574]
We propose a multi-site Markov decision process model that allows for both homogeneous and heterogeneous effects across sites.
We design the first federated policy optimization algorithm for offline RL with sample complexity.
We give a theoretical guarantee for the proposed algorithm, where the suboptimality for the learned policies is comparable to the rate as if data is not distributed.
arXiv Detail & Related papers (2022-06-11T18:03:26Z) - A Differentially Private Probabilistic Framework for Modeling the
Variability Across Federated Datasets of Heterogeneous Multi-View
Observations [4.511923587827301]
We show that our framework can be effectively optimized through expectation (EM) over latent master's distribution and clients' parameters.
We tested our method on the analysis of multi-modal medical imaging data and clinical scores from distributed clinical datasets of patients affected by Alzheimer's disease.
arXiv Detail & Related papers (2022-04-15T07:20:47Z) - Practical Challenges in Differentially-Private Federated Survival
Analysis of Medical Data [57.19441629270029]
In this paper, we take advantage of the inherent properties of neural networks to federate the process of training of survival analysis models.
In the realistic setting of small medical datasets and only a few data centers, this noise makes it harder for the models to converge.
We propose DPFed-post which adds a post-processing stage to the private federated learning scheme.
arXiv Detail & Related papers (2022-02-08T10:03:24Z) - Differentially private federated deep learning for multi-site medical
image segmentation [56.30543374146002]
Collaborative machine learning techniques such as federated learning (FL) enable the training of models on effectively larger datasets without data transfer.
Recent initiatives have demonstrated that segmentation models trained with FL can achieve performance similar to locally trained models.
However, FL is not a fully privacy-preserving technique and privacy-centred attacks can disclose confidential patient data.
arXiv Detail & Related papers (2021-07-06T12:57:32Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z) - Anonymizing Data for Privacy-Preserving Federated Learning [3.3673553810697827]
We propose the first syntactic approach for offering privacy in the context of federated learning.
Our approach aims to maximize utility or model performance, while supporting a defensible level of privacy.
We perform a comprehensive empirical evaluation on two important problems in the healthcare domain, using real-world electronic health data of 1 million patients.
arXiv Detail & Related papers (2020-02-21T02:30:16Z) - Multi-site fMRI Analysis Using Privacy-preserving Federated Learning and
Domain Adaptation: ABIDE Results [13.615292855384729]
To train a high-quality deep learning model, the aggregation of a significant amount of patient information is required.
Due to the need to protect the privacy of patient data, it is hard to assemble a central database from multiple institutions.
Federated learning allows for population-level models to be trained without centralizing entities' data.
arXiv Detail & Related papers (2020-01-16T04:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.