ppAURORA: Privacy Preserving Area Under Receiver Operating
Characteristic and Precision-Recall Curves
- URL: http://arxiv.org/abs/2102.08788v3
- Date: Thu, 15 Jun 2023 16:09:19 GMT
- Title: ppAURORA: Privacy Preserving Area Under Receiver Operating
Characteristic and Precision-Recall Curves
- Authors: Ali Burak \"Unal, Nico Pfeifer, Mete Akg\"un
- Abstract summary: An AUC is a performance measure to compare the quality of different machine learning models.
It can also be a problem to compute the global AUC, since the labels might also contain privacy-sensitive information.
We propose an MPC-based solution, called ppAURORA, with private merging of individually sorted lists from multiple sources to compute the exact AUC.
- Score: 1.933681537640272
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computing an AUC as a performance measure to compare the quality of different
machine learning models is one of the final steps of many research projects.
Many of these methods are trained on privacy-sensitive data and there are
several different approaches like $\epsilon$-differential privacy, federated
machine learning and cryptography if the datasets cannot be shared or used
jointly at one place for training and/or testing. In this setting, it can also
be a problem to compute the global AUC, since the labels might also contain
privacy-sensitive information. There have been approaches based on
$\epsilon$-differential privacy to address this problem, but to the best of our
knowledge, no exact privacy preserving solution has been introduced. In this
paper, we propose an MPC-based solution, called ppAURORA, with private merging
of individually sorted lists from multiple sources to compute the exact AUC as
one could obtain on the pooled original test samples. With ppAURORA, the
computation of the exact area under precision-recall and receiver operating
characteristic curves is possible even when ties between prediction confidence
values exist. We use ppAURORA to evaluate two different models predicting acute
myeloid leukemia therapy response and heart disease, respectively. We also
assess its scalability via synthetic data experiments. All these experiments
show that we efficiently and privately compute the exact same AUC with both
evaluation metrics as one can obtain on the pooled test samples in plaintext
according to the semi-honest adversary setting.
Related papers
- Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines.
We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z) - Distributed, communication-efficient, and differentially private estimation of KL divergence [15.294136011320433]
Key task in managing distributed, sensitive data is to measure the extent to which a distribution changes.
We describe novel algorithmic approaches for estimating the KL divergence of data across federated models of computation, under differential privacy.
arXiv Detail & Related papers (2024-11-25T15:20:40Z) - Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning.
One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems.
We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z) - Improving Bias Correction Standards by Quantifying its Effects on Treatment Outcomes [54.18828236350544]
Propensity score matching (PSM) addresses selection biases by selecting comparable populations for analysis.
Different matching methods can produce significantly different Average Treatment Effects (ATE) for the same task, even when meeting all validation criteria.
To address this issue, we introduce a novel metric, A2A, to reduce the number of valid matches.
arXiv Detail & Related papers (2024-07-20T12:42:24Z) - Differentially Private Multi-Site Treatment Effect Estimation [28.13660104055298]
Most patient data remains in silo in separate hospitals, preventing the design of data-driven healthcare AI systems.
We look at estimating the average treatment effect (ATE), an important task in causal inference for healthcare applications.
We address this through a class of per-site estimation algorithms that reports the ATE estimate and its variance as a quality measure.
arXiv Detail & Related papers (2023-10-10T01:21:01Z) - Exploratory Analysis of Federated Learning Methods with Differential
Privacy on MIMIC-III [0.7349727826230862]
Federated learning methods offer the possibility of training machine learning models on privacy-sensitive data sets.
We present an evaluation of the impact of different federation and differential privacy techniques when training models on the open-source MIMIC-III dataset.
arXiv Detail & Related papers (2023-02-08T17:27:44Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Uncertainty-Autoencoder-Based Privacy and Utility Preserving Data Type
Conscious Transformation [3.7315964084413173]
We propose an adversarial learning framework that deals with the privacy-utility tradeoff problem under two conditions.
Under data-type ignorant conditions, the privacy mechanism provides a one-hot encoding of categorical features, representing exactly one class.
Under data-type aware conditions, the categorical variables are represented by a collection of scores, one for each class.
arXiv Detail & Related papers (2022-05-04T08:40:15Z) - Differentially Private Estimation of Heterogeneous Causal Effects [9.355532300027727]
We introduce a general meta-algorithm for estimating conditional average treatment effects (CATE) with differential privacy guarantees.
Our meta-algorithm can work with simple, single-stage CATE estimators such as S-learner and more complex multi-stage estimators such as DR and R-learner.
arXiv Detail & Related papers (2022-02-22T17:21:18Z) - Partial sensitivity analysis in differential privacy [58.730520380312676]
We investigate the impact of each input feature on the individual's privacy loss.
We experimentally evaluate our approach on queries over private databases.
We also explore our findings in the context of neural network training on synthetic data.
arXiv Detail & Related papers (2021-09-22T08:29:16Z) - Sensitivity analysis in differentially private machine learning using
hybrid automatic differentiation [54.88777449903538]
We introduce a novel textithybrid automatic differentiation (AD) system for sensitivity analysis.
This enables modelling the sensitivity of arbitrary differentiable function compositions, such as the training of neural networks on private data.
Our approach can enable the principled reasoning about privacy loss in the setting of data processing.
arXiv Detail & Related papers (2021-07-09T07:19:23Z) - On Deep Learning with Label Differential Privacy [54.45348348861426]
We study the multi-class classification setting where the labels are considered sensitive and ought to be protected.
We propose a new algorithm for training deep neural networks with label differential privacy, and run evaluations on several datasets.
arXiv Detail & Related papers (2021-02-11T15:09:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.