Related papers: ppAURORA: Privacy Preserving Area Under Receiver Operating Characteristic and Precision-Recall Curves

ppAURORA: Privacy Preserving Area Under Receiver Operating Characteristic and Precision-Recall Curves

URL: http://arxiv.org/abs/2102.08788v3
Date: Thu, 15 Jun 2023 16:09:19 GMT
Title: ppAURORA: Privacy Preserving Area Under Receiver Operating Characteristic and Precision-Recall Curves
Authors: Ali Burak \"Unal, Nico Pfeifer, Mete Akg\"un
Abstract summary: An AUC is a performance measure to compare the quality of different machine learning models. It can also be a problem to compute the global AUC, since the labels might also contain privacy-sensitive information. We propose an MPC-based solution, called ppAURORA, with private merging of individually sorted lists from multiple sources to compute the exact AUC.
Score: 1.933681537640272
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Computing an AUC as a performance measure to compare the quality of different machine learning models is one of the final steps of many research projects. Many of these methods are trained on privacy-sensitive data and there are several different approaches like $\epsilon$-differential privacy, federated machine learning and cryptography if the datasets cannot be shared or used jointly at one place for training and/or testing. In this setting, it can also be a problem to compute the global AUC, since the labels might also contain privacy-sensitive information. There have been approaches based on $\epsilon$-differential privacy to address this problem, but to the best of our knowledge, no exact privacy preserving solution has been introduced. In this paper, we propose an MPC-based solution, called ppAURORA, with private merging of individually sorted lists from multiple sources to compute the exact AUC as one could obtain on the pooled original test samples. With ppAURORA, the computation of the exact area under precision-recall and receiver operating characteristic curves is possible even when ties between prediction confidence values exist. We use ppAURORA to evaluate two different models predicting acute myeloid leukemia therapy response and heart disease, respectively. We also assess its scalability via synthetic data experiments. All these experiments show that we efficiently and privately compute the exact same AUC with both evaluation metrics as one can obtain on the pooled test samples in plaintext according to the semi-honest adversary setting.

Related papers

Benchmarking Fraud Detectors on Private Graph Data [70.4654745317714]
Currently, many types of fraud are managed in part by automated detection algorithms that operate over graphs.<n>We consider the scenario where a data holder wishes to outsource development of fraud detectors to third parties.<n>Third parties submit their fraud detectors to the data holder, who evaluates these algorithms on a private dataset and then publicly communicates the results.<n>We propose a realistic privacy attack on this system that allows an adversary to de-anonymize individuals' data based only on the evaluation results.
arXiv Detail & Related papers (2025-07-30T03:20:15Z)
Differentially private ratio statistics [0.0]
We show that even a simple algorithm can provide excellent properties concerning privacy, sample accuracy, and bias.<n>Our approach bridges a gap in the differential privacy literature and provides a practical solution for ratio estimation in private machine learning pipelines.
arXiv Detail & Related papers (2025-05-26T04:28:27Z)
Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines. We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z)
Distributed, communication-efficient, and differentially private estimation of KL divergence [15.294136011320433]
Key task in managing distributed, sensitive data is to measure the extent to which a distribution changes. We describe novel algorithmic approaches for estimating the KL divergence of data across federated models of computation, under differential privacy.
arXiv Detail & Related papers (2024-11-25T15:20:40Z)
Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning. One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems. We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z)
Improving Bias Correction Standards by Quantifying its Effects on Treatment Outcomes [54.18828236350544]
Propensity score matching (PSM) addresses selection biases by selecting comparable populations for analysis. Different matching methods can produce significantly different Average Treatment Effects (ATE) for the same task, even when meeting all validation criteria. To address this issue, we introduce a novel metric, A2A, to reduce the number of valid matches.
arXiv Detail & Related papers (2024-07-20T12:42:24Z)
Differentially Private Multi-Site Treatment Effect Estimation [28.13660104055298]
Most patient data remains in silo in separate hospitals, preventing the design of data-driven healthcare AI systems. We look at estimating the average treatment effect (ATE), an important task in causal inference for healthcare applications. We address this through a class of per-site estimation algorithms that reports the ATE estimate and its variance as a quality measure.
arXiv Detail & Related papers (2023-10-10T01:21:01Z)
Improving the Variance of Differentially Private Randomized Experiments through Clustering [16.166525280886578]
We propose a new differentially private mechanism, "Cluster-DP"<n>We demonstrate that selecting higher-quality clusters, according to a quality metric we introduce, can decrease the variance penalty without compromising privacy guarantees.
arXiv Detail & Related papers (2023-08-02T05:51:57Z)
Exploratory Analysis of Federated Learning Methods with Differential Privacy on MIMIC-III [0.7349727826230862]
Federated learning methods offer the possibility of training machine learning models on privacy-sensitive data sets. We present an evaluation of the impact of different federation and differential privacy techniques when training models on the open-source MIMIC-III dataset.
arXiv Detail & Related papers (2023-02-08T17:27:44Z)
Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD. We find that most examples enjoy stronger privacy guarantees than the worst-case bound. This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z)
Uncertainty-Autoencoder-Based Privacy and Utility Preserving Data Type Conscious Transformation [3.7315964084413173]
We propose an adversarial learning framework that deals with the privacy-utility tradeoff problem under two conditions. Under data-type ignorant conditions, the privacy mechanism provides a one-hot encoding of categorical features, representing exactly one class. Under data-type aware conditions, the categorical variables are represented by a collection of scores, one for each class.
arXiv Detail & Related papers (2022-05-04T08:40:15Z)
Differentially Private Estimation of Heterogeneous Causal Effects [9.355532300027727]
We introduce a general meta-algorithm for estimating conditional average treatment effects (CATE) with differential privacy guarantees. Our meta-algorithm can work with simple, single-stage CATE estimators such as S-learner and more complex multi-stage estimators such as DR and R-learner.
arXiv Detail & Related papers (2022-02-22T17:21:18Z)
Partial sensitivity analysis in differential privacy [58.730520380312676]
We investigate the impact of each input feature on the individual's privacy loss. We experimentally evaluate our approach on queries over private databases. We also explore our findings in the context of neural network training on synthetic data.
arXiv Detail & Related papers (2021-09-22T08:29:16Z)
Sensitivity analysis in differentially private machine learning using hybrid automatic differentiation [54.88777449903538]
We introduce a novel textithybrid automatic differentiation (AD) system for sensitivity analysis. This enables modelling the sensitivity of arbitrary differentiable function compositions, such as the training of neural networks on private data. Our approach can enable the principled reasoning about privacy loss in the setting of data processing.
arXiv Detail & Related papers (2021-07-09T07:19:23Z)
On Deep Learning with Label Differential Privacy [54.45348348861426]
We study the multi-class classification setting where the labels are considered sensitive and ought to be protected. We propose a new algorithm for training deep neural networks with label differential privacy, and run evaluations on several datasets.
arXiv Detail & Related papers (2021-02-11T15:09:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.