Related papers: Anonymizing Data for Privacy-Preserving Federated Learning

Anonymizing Data for Privacy-Preserving Federated Learning

URL: http://arxiv.org/abs/2002.09096v1
Date: Fri, 21 Feb 2020 02:30:16 GMT
Title: Anonymizing Data for Privacy-Preserving Federated Learning
Authors: Olivia Choudhury, Aris Gkoulalas-Divanis, Theodoros Salonidis, Issa Sylla, Yoonyoung Park, Grace Hsu, Amar Das
Abstract summary: We propose the first syntactic approach for offering privacy in the context of federated learning. Our approach aims to maximize utility or model performance, while supporting a defensible level of privacy. We perform a comprehensive empirical evaluation on two important problems in the healthcare domain, using real-world electronic health data of 1 million patients.
Score: 3.3673553810697827
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Federated learning enables training a global machine learning model from data distributed across multiple sites, without having to move the data. This is particularly relevant in healthcare applications, where data is rife with personal, highly-sensitive information, and data analysis methods must provably comply with regulatory guidelines. Although federated learning prevents sharing raw data, it is still possible to launch privacy attacks on the model parameters that are exposed during the training process, or on the generated machine learning model. In this paper, we propose the first syntactic approach for offering privacy in the context of federated learning. Unlike the state-of-the-art differential privacy-based frameworks, our approach aims to maximize utility or model performance, while supporting a defensible level of privacy, as demanded by GDPR and HIPAA. We perform a comprehensive empirical evaluation on two important problems in the healthcare domain, using real-world electronic health data of 1 million patients. The results demonstrate the effectiveness of our approach in achieving high model performance, while offering the desired level of privacy. Through comparative studies, we also show that, for varying datasets, experimental setups, and privacy budgets, our approach offers higher model performance than differential privacy-based techniques in federated learning.

Related papers

Differential Privacy-Driven Framework for Enhancing Heart Disease Prediction [7.473832609768354]
Machine learning is critical in healthcare, supporting personalized treatment, early disease detection, predictive analytics, image interpretation, drug discovery, efficient operations, and patient monitoring. In this paper, we utilize machine learning methodologies, including differential privacy and federated learning, to develop privacy-preserving models. Our results show that using a federated learning model with differential privacy achieved a test accuracy of 85%, ensuring patient data remained secure and private throughout the process.
arXiv Detail & Related papers (2025-04-25T01:27:40Z)
Federated Learning for Cross-Domain Data Privacy: A Distributed Approach to Secure Collaboration [13.206587690640147]
This paper proposes a data privacy protection framework based on federated learning. It aims to realize effective cross-domain data collaboration under the premise of ensuring data privacy through distributed learning.
arXiv Detail & Related papers (2025-03-31T23:04:45Z)
Controllable Synthetic Clinical Note Generation with Privacy Guarantees [7.1366477372157995]
In this paper, we introduce a novel method to "clone" datasets containing Personal Health Information (PHI) Our approach ensures that the cloned datasets retain the essential characteristics and utility of the original data without compromising patient privacy. We conduct utility testing to evaluate the performance of machine learning models trained on the cloned datasets.
arXiv Detail & Related papers (2024-09-12T07:38:34Z)
Improving the Classification Effect of Clinical Images of Diseases for Multi-Source Privacy Protection [0.0]
Privacy data protection in the medical field poses challenges to data sharing. Traditional centralized training methods are difficult to apply due to violations of privacy protection principles. We propose a medical privacy data training framework based on data vectors.
arXiv Detail & Related papers (2024-08-23T12:52:24Z)
FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners. FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks. We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing. Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data. Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z)
Exploratory Analysis of Federated Learning Methods with Differential Privacy on MIMIC-III [0.7349727826230862]
Federated learning methods offer the possibility of training machine learning models on privacy-sensitive data sets. We present an evaluation of the impact of different federation and differential privacy techniques when training models on the open-source MIMIC-III dataset.
arXiv Detail & Related papers (2023-02-08T17:27:44Z)
Differentially Private Language Models for Secure Data Sharing [19.918137395199224]
In this paper, we show how to train a generative language model in a differentially private manner and consequently sampling data from it. Using natural language prompts and a new prompt-mismatch loss, we are able to create highly accurate and fluent textual datasets. We perform thorough experiments indicating that our synthetic datasets do not leak information from our original data and are of high language quality.
arXiv Detail & Related papers (2022-10-25T11:12:56Z)
Personalized PATE: Differential Privacy for Machine Learning with Individual Privacy Guarantees [1.2691047660244335]
We propose three novel methods to support training an ML model with different personalized privacy guarantees within the training data. Our experiments show that our personalized privacy methods yield higher accuracy models than the non-personalized baseline.
arXiv Detail & Related papers (2022-02-21T20:16:27Z)
Personalization Improves Privacy-Accuracy Tradeoffs in Federated Optimization [57.98426940386627]
We show that coordinating local learning with private centralized learning yields a generically useful and improved tradeoff between accuracy and privacy. We illustrate our theoretical results with experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2022-02-10T20:44:44Z)
Differentially private federated deep learning for multi-site medical image segmentation [56.30543374146002]
Collaborative machine learning techniques such as federated learning (FL) enable the training of models on effectively larger datasets without data transfer. Recent initiatives have demonstrated that segmentation models trained with FL can achieve performance similar to locally trained models. However, FL is not a fully privacy-preserving technique and privacy-centred attacks can disclose confidential patient data.
arXiv Detail & Related papers (2021-07-06T12:57:32Z)
GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators [74.16405337436213]
We propose Gradient-sanitized Wasserstein Generative Adrial Networks (GS-WGAN) GS-WGAN allows releasing a sanitized form of sensitive data with rigorous privacy guarantees. We find our approach consistently outperforms state-of-the-art approaches across multiple metrics.
arXiv Detail & Related papers (2020-06-15T10:01:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.