Spending Privacy Budget Fairly and Wisely
- URL: http://arxiv.org/abs/2204.12903v1
- Date: Wed, 27 Apr 2022 13:13:56 GMT
- Title: Spending Privacy Budget Fairly and Wisely
- Authors: Lucas Rosenblatt and Joshua Allen and Julia Stoyanovich
- Abstract summary: Differentially private (DP) synthetic data generation is a practical method for improving access to data.
One issue inherent to DP is that the "privacy budget" is generally "spent" evenly across features in the data set.
We develop ensemble methods that distribute the privacy budget "wisely" to maximize predictive accuracy of models trained on DP data.
- Score: 7.975975942400017
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differentially private (DP) synthetic data generation is a practical method
for improving access to data as a means to encourage productive partnerships.
One issue inherent to DP is that the "privacy budget" is generally "spent"
evenly across features in the data set. This leads to good statistical parity
with the real data, but can undervalue the conditional probabilities and
marginals that are critical for predictive quality of synthetic data. Further,
loss of predictive quality may be non-uniform across the data set, with subsets
that correspond to minority groups potentially suffering a higher loss.
In this paper, we develop ensemble methods that distribute the privacy budget
"wisely" to maximize predictive accuracy of models trained on DP data, and
"fairly" to bound potential disparities in accuracy across groups and reduce
inequality. Our methods are based on the insights that feature importance can
inform how privacy budget is allocated, and, further, that per-group feature
importance and fairness-related performance objectives can be incorporated in
the allocation. These insights make our methods tunable to social contexts,
allowing data owners to produce balanced synthetic data for predictive
analysis.
Related papers
- DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized Mixing [0.8739101659113155]
We introduce an effective data publishing algorithm emphDP-CDA.
Our proposed algorithm generates synthetic datasets by randomly mixing data in a class-specific manner, and inducing carefully-tuned randomness to ensure privacy guarantees.
Our results indicate that synthetic datasets produced using the DP-CDA can achieve superior utility compared to those generated by traditional data publishing algorithms, even when subject to the same privacy requirements.
arXiv Detail & Related papers (2024-11-25T06:14:06Z) - Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.
We propose a method called Stratified Prediction-Powered Inference (StratPPI)
We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z) - Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset.
We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z) - Learning Antidote Data to Individual Unfairness [23.119278763970037]
Individual fairness is a vital notion to describe fair treatment for individual cases.
Previous studies characterize individual fairness as a prediction-invariant problem.
We show our method resists individual unfairness at a minimal or zero cost to predictive utility.
arXiv Detail & Related papers (2022-11-29T03:32:39Z) - Improved Generalization Guarantees in Restricted Data Models [16.193776814471768]
Differential privacy is known to protect against threats to validity incurred due to adaptive, or exploratory, data analysis.
We show that, under this assumption, it is possible to "re-use" privacy budget on different portions of the data, significantly improving accuracy without increasing the risk of overfitting.
arXiv Detail & Related papers (2022-07-20T16:04:12Z) - Data Sharing Markets [95.13209326119153]
We study a setup where each agent can be both buyer and seller of data.
We consider two cases: bilateral data exchange (trading data with data) and unilateral data exchange (trading data with money)
arXiv Detail & Related papers (2021-07-19T06:00:34Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z) - Really Useful Synthetic Data -- A Framework to Evaluate the Quality of
Differentially Private Synthetic Data [2.538209532048867]
Recent advances in generating synthetic data that allow to add principled ways of protecting privacy are a crucial step in sharing statistical information in a privacy preserving way.
To further optimise the inherent trade-off between data privacy and data quality, it is necessary to think closely about the latter.
We develop a framework to evaluate the quality of differentially private synthetic data from an applied researcher's perspective.
arXiv Detail & Related papers (2020-04-16T16:24:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.