Anonymizing Data for Privacy-Preserving Federated Learning
- URL: http://arxiv.org/abs/2002.09096v1
- Date: Fri, 21 Feb 2020 02:30:16 GMT
- Title: Anonymizing Data for Privacy-Preserving Federated Learning
- Authors: Olivia Choudhury, Aris Gkoulalas-Divanis, Theodoros Salonidis, Issa
Sylla, Yoonyoung Park, Grace Hsu, Amar Das
- Abstract summary: We propose the first syntactic approach for offering privacy in the context of federated learning.
Our approach aims to maximize utility or model performance, while supporting a defensible level of privacy.
We perform a comprehensive empirical evaluation on two important problems in the healthcare domain, using real-world electronic health data of 1 million patients.
- Score: 3.3673553810697827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning enables training a global machine learning model from data
distributed across multiple sites, without having to move the data. This is
particularly relevant in healthcare applications, where data is rife with
personal, highly-sensitive information, and data analysis methods must provably
comply with regulatory guidelines. Although federated learning prevents sharing
raw data, it is still possible to launch privacy attacks on the model
parameters that are exposed during the training process, or on the generated
machine learning model. In this paper, we propose the first syntactic approach
for offering privacy in the context of federated learning. Unlike the
state-of-the-art differential privacy-based frameworks, our approach aims to
maximize utility or model performance, while supporting a defensible level of
privacy, as demanded by GDPR and HIPAA. We perform a comprehensive empirical
evaluation on two important problems in the healthcare domain, using real-world
electronic health data of 1 million patients. The results demonstrate the
effectiveness of our approach in achieving high model performance, while
offering the desired level of privacy. Through comparative studies, we also
show that, for varying datasets, experimental setups, and privacy budgets, our
approach offers higher model performance than differential privacy-based
techniques in federated learning.
Related papers
- Controllable Synthetic Clinical Note Generation with Privacy Guarantees [7.1366477372157995]
In this paper, we introduce a novel method to "clone" datasets containing Personal Health Information (PHI)
Our approach ensures that the cloned datasets retain the essential characteristics and utility of the original data without compromising patient privacy.
We conduct utility testing to evaluate the performance of machine learning models trained on the cloned datasets.
arXiv Detail & Related papers (2024-09-12T07:38:34Z) - Improving the Classification Effect of Clinical Images of Diseases for Multi-Source Privacy Protection [0.0]
Privacy data protection in the medical field poses challenges to data sharing.
Traditional centralized training methods are difficult to apply due to violations of privacy protection principles.
We propose a medical privacy data training framework based on data vectors.
arXiv Detail & Related papers (2024-08-23T12:52:24Z) - FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Exploratory Analysis of Federated Learning Methods with Differential
Privacy on MIMIC-III [0.7349727826230862]
Federated learning methods offer the possibility of training machine learning models on privacy-sensitive data sets.
We present an evaluation of the impact of different federation and differential privacy techniques when training models on the open-source MIMIC-III dataset.
arXiv Detail & Related papers (2023-02-08T17:27:44Z) - Differentially Private Language Models for Secure Data Sharing [19.918137395199224]
In this paper, we show how to train a generative language model in a differentially private manner and consequently sampling data from it.
Using natural language prompts and a new prompt-mismatch loss, we are able to create highly accurate and fluent textual datasets.
We perform thorough experiments indicating that our synthetic datasets do not leak information from our original data and are of high language quality.
arXiv Detail & Related papers (2022-10-25T11:12:56Z) - Personalized PATE: Differential Privacy for Machine Learning with
Individual Privacy Guarantees [1.2691047660244335]
We propose three novel methods to support training an ML model with different personalized privacy guarantees within the training data.
Our experiments show that our personalized privacy methods yield higher accuracy models than the non-personalized baseline.
arXiv Detail & Related papers (2022-02-21T20:16:27Z) - Personalization Improves Privacy-Accuracy Tradeoffs in Federated
Optimization [57.98426940386627]
We show that coordinating local learning with private centralized learning yields a generically useful and improved tradeoff between accuracy and privacy.
We illustrate our theoretical results with experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2022-02-10T20:44:44Z) - Differentially private federated deep learning for multi-site medical
image segmentation [56.30543374146002]
Collaborative machine learning techniques such as federated learning (FL) enable the training of models on effectively larger datasets without data transfer.
Recent initiatives have demonstrated that segmentation models trained with FL can achieve performance similar to locally trained models.
However, FL is not a fully privacy-preserving technique and privacy-centred attacks can disclose confidential patient data.
arXiv Detail & Related papers (2021-07-06T12:57:32Z) - GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially
Private Generators [74.16405337436213]
We propose Gradient-sanitized Wasserstein Generative Adrial Networks (GS-WGAN)
GS-WGAN allows releasing a sanitized form of sensitive data with rigorous privacy guarantees.
We find our approach consistently outperforms state-of-the-art approaches across multiple metrics.
arXiv Detail & Related papers (2020-06-15T10:01:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.