Related papers: Privacy Preserving Data Imputation via Multi-party Computation for Medical Applications

Privacy Preserving Data Imputation via Multi-party Computation for Medical Applications

URL: http://arxiv.org/abs/2405.18878v1
Date: Wed, 29 May 2024 08:36:42 GMT
Title: Privacy Preserving Data Imputation via Multi-party Computation for Medical Applications
Authors: Julia Jentsch, Ali Burak Ünal, Şeyma Selcan Mağara, Mete Akgün,
Abstract summary: This study addresses privacy-preserving imputation methods for sensitive data using secure multi-party computation. We specifically target the medical and healthcare domains considering the significance of protection of the patient data. Experiments on the diabetes dataset validated the correctness of our privacy-preserving imputation methods, yielding the largest error around $3 times 10-3$.
Score: 1.7999333451993955
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Handling missing data is crucial in machine learning, but many datasets contain gaps due to errors or non-response. Unlike traditional methods such as listwise deletion, which are simple but inadequate, the literature offers more sophisticated and effective methods, thereby improving sample size and accuracy. However, these methods require accessing the whole dataset, which contradicts the privacy regulations when the data is distributed among multiple sources. Especially in the medical and healthcare domain, such access reveals sensitive information about patients. This study addresses privacy-preserving imputation methods for sensitive data using secure multi-party computation, enabling secure computations without revealing any party's sensitive information. In this study, we realized the mean, median, regression, and kNN imputation methods in a privacy-preserving way. We specifically target the medical and healthcare domains considering the significance of protection of the patient data, showcasing our methods on a diabetes dataset. Experiments on the diabetes dataset validated the correctness of our privacy-preserving imputation methods, yielding the largest error around $3 \times 10^{-3}$, closely matching plaintext methods. We also analyzed the scalability of our methods to varying numbers of samples, showing their applicability to real-world healthcare problems. Our analysis demonstrated that all our methods scale linearly with the number of samples. Except for kNN, the runtime of all our methods indicates that they can be utilized for large datasets.

Related papers

Benchmarking Fraud Detectors on Private Graph Data [70.4654745317714]
Currently, many types of fraud are managed in part by automated detection algorithms that operate over graphs.<n>We consider the scenario where a data holder wishes to outsource development of fraud detectors to third parties.<n>Third parties submit their fraud detectors to the data holder, who evaluates these algorithms on a private dataset and then publicly communicates the results.<n>We propose a realistic privacy attack on this system that allows an adversary to de-anonymize individuals' data based only on the evaluation results.
arXiv Detail & Related papers (2025-07-30T03:20:15Z)
A Privacy-Preserving Data Collection Method for Diversified Statistical Analysis [11.135689359531105]
This paper proposes a novel real-value negative survey model, termed RVNS, for the first time in the field of real-value sensitive information collection.<n>The RVNS model exempts users from the necessity of discretizing their data and only requires them to sample a set of data from a range that deviates from their actual sensitive details.
arXiv Detail & Related papers (2025-07-23T04:05:33Z)
Integrative Analysis and Imputation of Multiple Data Streams via Deep Gaussian Processes [0.10870620888258162]
Healthcare data presents three key challenges for analysis.<n>First, physiological measurements come from different sources but are inherently related.<n>Second, clinical measurements are collected at irregular intervals, and these sampling times can carry clinical meaning.
arXiv Detail & Related papers (2025-05-17T16:32:52Z)
Defending Against Gradient Inversion Attacks for Biomedical Images via Learnable Data Perturbation [3.5280398899666903]
We present a defense against gradient inversion attacks in federated learning. Our approach can outperform the baselines with a reduction of 12.5% in the attacker's accuracy in classifying reconstructed images. Results suggest the potential of a generalizable defense for healthcare data.
arXiv Detail & Related papers (2025-03-19T01:53:23Z)
Privacy Preserving Federated Unsupervised Domain Adaptation with Application to Age Prediction from DNA Methylation Data [2.699900017799093]
We introduce a privacy-preserving framework for unsupervised domain adaptation in high-dimensional settings. Our framework is the first privacy-preserving solution for high-dimensional domain adaptation in federated environments.
arXiv Detail & Related papers (2024-11-26T10:19:16Z)
Differentially Private Multi-Site Treatment Effect Estimation [28.13660104055298]
Most patient data remains in silo in separate hospitals, preventing the design of data-driven healthcare AI systems. We look at estimating the average treatment effect (ATE), an important task in causal inference for healthcare applications. We address this through a class of per-site estimation algorithms that reports the ATE estimate and its variance as a quality measure.
arXiv Detail & Related papers (2023-10-10T01:21:01Z)
Mean Estimation with User-level Privacy under Data Heterogeneity [54.07947274508013]
Different users may possess vastly different numbers of data points. It cannot be assumed that all users sample from the same underlying distribution. We propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data.
arXiv Detail & Related papers (2023-07-28T23:02:39Z)
Leveraging Unlabelled Data in Multiple-Instance Learning Problems for Improved Detection of Parkinsonian Tremor in Free-Living Conditions [80.88681952022479]
We introduce a new method for combining semi-supervised with multiple-instance learning. We show that by leveraging the unlabelled data of 454 subjects we can achieve large performance gains in per-subject tremor detection.
arXiv Detail & Related papers (2023-04-29T12:25:10Z)
Time-dependent Iterative Imputation for Multivariate Longitudinal Clinical Data [0.0]
Time-Dependent Iterative imputation offers a practical solution for imputing time-series data. When applied to a cohort consisting of more than 500,000 patient observations, our approach outperformed state-of-the-art imputation methods.
arXiv Detail & Related papers (2023-04-16T16:10:49Z)
Deep Imputation of Missing Values in Time Series Health Data: A Review with Benchmarking [0.0]
This survey performs six data-centric experiments to benchmark state-of-the-art deep imputation methods on five time series health data sets. Deep learning methods that jointly perform cross-sectional (across variables) and longitudinal (across time) imputations of missing values in time series data yield statistically better data quality than traditional imputation methods.
arXiv Detail & Related papers (2023-02-10T16:03:36Z)
Distributed sequential federated learning [0.0]
We develop a data-driven method for efficiently and effectively aggregating valued information by analyzing local data. We use numerical studies of simulated data and an application to COVID-19 data collected from 32 hospitals in Mexico.
arXiv Detail & Related papers (2023-01-31T21:20:45Z)
When Accuracy Meets Privacy: Two-Stage Federated Transfer Learning Framework in Classification of Medical Images on Limited Data: A COVID-19 Case Study [77.34726150561087]
COVID-19 pandemic has spread rapidly and caused a shortage of global medical resources. CNN has been widely utilized and verified in analyzing medical images.
arXiv Detail & Related papers (2022-03-24T02:09:41Z)
Practical Challenges in Differentially-Private Federated Survival Analysis of Medical Data [57.19441629270029]
In this paper, we take advantage of the inherent properties of neural networks to federate the process of training of survival analysis models. In the realistic setting of small medical datasets and only a few data centers, this noise makes it harder for the models to converge. We propose DPFed-post which adds a post-processing stage to the private federated learning scheme.
arXiv Detail & Related papers (2022-02-08T10:03:24Z)
MURAL: An Unsupervised Random Forest-Based Embedding for Electronic Health Record Data [59.26381272149325]
We present an unsupervised random forest for representing data with disparate variable types. MURAL forests consist of a set of decision trees where node-splitting variables are chosen at random. We show that using our approach, we can visualize and classify data more accurately than competing approaches.
arXiv Detail & Related papers (2021-11-19T22:02:21Z)
FLOP: Federated Learning on Medical Datasets using Partial Networks [84.54663831520853]
COVID-19 Disease due to the novel coronavirus has caused a shortage of medical resources. Different data-driven deep learning models have been developed to mitigate the diagnosis of COVID-19. The data itself is still scarce due to patient privacy concerns. We propose a simple yet effective algorithm, named textbfFederated textbfL textbfon Medical datasets using textbfPartial Networks (FLOP)
arXiv Detail & Related papers (2021-02-10T01:56:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.