Statistical Privacy Guarantees of Machine Learning Preprocessing
Techniques
- URL: http://arxiv.org/abs/2109.02496v1
- Date: Mon, 6 Sep 2021 14:08:47 GMT
- Title: Statistical Privacy Guarantees of Machine Learning Preprocessing
Techniques
- Authors: Ashly Lau and Jonathan Passerat-Palmbach
- Abstract summary: We adapt a privacy violation detection framework based on statistical methods to measure privacy levels of machine learning pipelines.
We apply the newly created framework to show that resampling techniques used when dealing with imbalanced datasets cause the resultant model to leak more privacy.
- Score: 1.198727138090351
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Differential privacy provides strong privacy guarantees for machine learning
applications. Much recent work has been focused on developing differentially
private models, however there has been a gap in other stages of the machine
learning pipeline, in particular during the preprocessing phase. Our
contributions are twofold: we adapt a privacy violation detection framework
based on statistical methods to empirically measure privacy levels of machine
learning pipelines, and apply the newly created framework to show that
resampling techniques used when dealing with imbalanced datasets cause the
resultant model to leak more privacy. These results highlight the need for
developing private preprocessing techniques.
Related papers
- Provable Privacy with Non-Private Pre-Processing [56.770023668379615]
We propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms.
Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions.
arXiv Detail & Related papers (2024-03-19T17:54:49Z) - Conditional Density Estimations from Privacy-Protected Data [0.0]
We propose simulation-based inference methods from privacy-protected datasets.
We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models.
arXiv Detail & Related papers (2023-10-19T14:34:17Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Privacy Side Channels in Machine Learning Systems [87.53240071195168]
We introduce privacy side channels: attacks that exploit system-level components to extract private information.
For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees.
We further show that systems which block language models from regenerating training data can be exploited to exfiltrate private keys contained in the training set.
arXiv Detail & Related papers (2023-09-11T16:49:05Z) - Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving
Training Data Release for Machine Learning [3.29354893777827]
We introduce a data release framework, 3A (Approximate, Adapt, Anonymize), to maximize data utility for machine learning.
We present experimental evidence showing minimal discrepancy between performance metrics of models trained on real versus privatized datasets.
arXiv Detail & Related papers (2023-07-04T18:37:11Z) - Learning Differentially Private Probabilistic Models for
Privacy-Preserving Image Generation [67.47979276739144]
We propose learning differentially private probabilistic models to generate high-resolution images with differential privacy guarantee.
Our approach can generate images up to 256x256 with remarkable visual quality and data utility.
arXiv Detail & Related papers (2023-05-18T02:51:17Z) - Tight Auditing of Differentially Private Machine Learning [77.38590306275877]
For private machine learning, existing auditing mechanisms are tight.
They only give tight estimates under implausible worst-case assumptions.
We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets.
arXiv Detail & Related papers (2023-02-15T21:40:33Z) - A Survey on Differential Privacy with Machine Learning and Future
Outlook [0.0]
differential privacy is used to protect machine learning models from any attacks and vulnerabilities.
This survey paper presents different differentially private machine learning algorithms categorized into two main categories.
arXiv Detail & Related papers (2022-11-19T14:20:53Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - A General Framework for Auditing Differentially Private Machine Learning [27.99806936918949]
We present a framework to statistically audit the privacy guarantee conferred by a differentially private machine learner in practice.
Our work develops a general methodology to empirically evaluate the privacy of differentially private machine learning implementations.
arXiv Detail & Related papers (2022-10-16T21:34:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.