An Analytical Approach to Privacy and Performance Trade-Offs in Healthcare Data Sharing
- URL: http://arxiv.org/abs/2508.18513v1
- Date: Mon, 25 Aug 2025 21:36:47 GMT
- Title: An Analytical Approach to Privacy and Performance Trade-Offs in Healthcare Data Sharing
- Authors: Yusi Wei, Hande Y. Benson, Muge Capan,
- Abstract summary: Older adults, frequently hospitalized patients, and racial minorities are vulnerable to privacy attacks.<n>We evaluate three anonymization methods-$k$-anonymity, the technique by Zheng et al., and the MO-OBAM model-based on their ability to reduce re-identification risk.
- Score: 1.2179548969182572
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The secondary use of healthcare data is vital for research and clinical innovation, but it raises concerns about patient privacy. This study investigates how to balance privacy preservation and data utility in healthcare data sharing, considering the perspectives of both data providers and data users. Using a dataset of adult patients hospitalized between 2013 and 2015, we predict whether sepsis was present at admission or developed during the hospital stay. We identify sub-populations, such as older adults, frequently hospitalized patients, and racial minorities, that are especially vulnerable to privacy attacks due to their unique combinations of demographic and healthcare utilization attributes. These groups are also critical for machine learning (ML) model performance. We evaluate three anonymization methods-$k$-anonymity, the technique by Zheng et al., and the MO-OBAM model-based on their ability to reduce re-identification risk while maintaining ML utility. Results show that $k$-anonymity offers limited protection. The methods of Zheng et al. and MO-OBAM provide stronger privacy safeguards, with MO-OBAM yielding the best utility outcomes: only a 2% change in precision and recall compared to the original dataset. This work provides actionable insights for healthcare organizations on how to share data responsibly. It highlights the need for anonymization methods that protect vulnerable populations without sacrificing the performance of data-driven models.
Related papers
- On the MIA Vulnerability Gap Between Private GANs and Diffusion Models [51.53790101362898]
Generative Adversarial Networks (GANs) and diffusion models have emerged as leading approaches for high-quality image synthesis.<n>We present the first unified theoretical and empirical analysis of the privacy risks faced by differentially private generative models.
arXiv Detail & Related papers (2025-09-03T14:18:22Z) - A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release.<n>Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z) - Differential Privacy-Driven Framework for Enhancing Heart Disease Prediction [7.473832609768354]
Machine learning is critical in healthcare, supporting personalized treatment, early disease detection, predictive analytics, image interpretation, drug discovery, efficient operations, and patient monitoring.<n>In this paper, we utilize machine learning methodologies, including differential privacy and federated learning, to develop privacy-preserving models.<n>Our results show that using a federated learning model with differential privacy achieved a test accuracy of 85%, ensuring patient data remained secure and private throughout the process.
arXiv Detail & Related papers (2025-04-25T01:27:40Z) - Defending Against Gradient Inversion Attacks for Biomedical Images via Learnable Data Perturbation [3.5280398899666903]
We present a defense against gradient inversion attacks in federated learning.<n>Our approach can outperform the baselines with a reduction of 12.5% in the attacker's accuracy in classifying reconstructed images.<n>Results suggest the potential of a generalizable defense for healthcare data.
arXiv Detail & Related papers (2025-03-19T01:53:23Z) - Privacy-Preserving Heterogeneous Federated Learning for Sensitive Healthcare Data [12.30620268528346]
We propose a new framework termed Abstention-Aware Federated Voting (AAFV)
AAFV can collaboratively and confidentially train heterogeneous local models while simultaneously protecting the data privacy.
In particular, the proposed abstention-aware voting mechanism exploits a threshold-based abstention method to select high-confidence votes from heterogeneous local models.
arXiv Detail & Related papers (2024-06-15T08:43:40Z) - Preserving The Safety And Confidentiality Of Data Mining Information In Health Care: A literature review [0.0]
PPDM technique enables the extraction of actionable insight from enormous volume of data.
Disclosing sensitive information infringes on patients' privacy.
This paper aims to conduct a review of related work on privacy-preserving mechanisms, data protection regulations, and mitigating tactics.
arXiv Detail & Related papers (2023-10-30T05:32:15Z) - Blockchain-empowered Federated Learning for Healthcare Metaverses:
User-centric Incentive Mechanism with Optimal Data Freshness [66.3982155172418]
We first design a user-centric privacy-preserving framework based on decentralized Federated Learning (FL) for healthcare metaverses.
We then utilize Age of Information (AoI) as an effective data-freshness metric and propose an AoI-based contract theory model under Prospect Theory (PT) to motivate sensing data sharing.
arXiv Detail & Related papers (2023-07-29T12:54:03Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Private, fair and accurate: Training large-scale, privacy-preserving AI models in medical imaging [47.99192239793597]
We evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training.
Our study shows that -- under the challenging realistic circumstances of a real-life clinical dataset -- the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.
arXiv Detail & Related papers (2023-02-03T09:49:13Z) - A Review of Anonymization for Healthcare Data [0.30586855806896046]
Health data is highly sensitive and subject to regulations such as General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation (
arXiv Detail & Related papers (2021-04-13T21:44:29Z) - Privacy-preserving medical image analysis [53.4844489668116]
We present PriMIA, a software framework designed for privacy-preserving machine learning (PPML) in medical imaging.
We show significantly better classification performance of a securely aggregated federated learning model compared to human experts on unseen datasets.
We empirically evaluate the framework's security against a gradient-based model inversion attack.
arXiv Detail & Related papers (2020-12-10T13:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.