Statistical Privacy Guarantees of Machine Learning Preprocessing
Techniques
- URL: http://arxiv.org/abs/2109.02496v1
- Date: Mon, 6 Sep 2021 14:08:47 GMT
- Title: Statistical Privacy Guarantees of Machine Learning Preprocessing
Techniques
- Authors: Ashly Lau and Jonathan Passerat-Palmbach
- Abstract summary: We adapt a privacy violation detection framework based on statistical methods to measure privacy levels of machine learning pipelines.
We apply the newly created framework to show that resampling techniques used when dealing with imbalanced datasets cause the resultant model to leak more privacy.
- Score: 1.198727138090351
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Differential privacy provides strong privacy guarantees for machine learning
applications. Much recent work has been focused on developing differentially
private models, however there has been a gap in other stages of the machine
learning pipeline, in particular during the preprocessing phase. Our
contributions are twofold: we adapt a privacy violation detection framework
based on statistical methods to empirically measure privacy levels of machine
learning pipelines, and apply the newly created framework to show that
resampling techniques used when dealing with imbalanced datasets cause the
resultant model to leak more privacy. These results highlight the need for
developing private preprocessing techniques.
Related papers
- Masked Differential Privacy [64.32494202656801]
We propose an effective approach called masked differential privacy (DP), which allows for controlling sensitive regions where differential privacy is applied.
Our method operates selectively on data and allows for defining non-sensitive-temporal regions without DP application or combining differential privacy with other privacy techniques within data samples.
arXiv Detail & Related papers (2024-10-22T15:22:53Z) - Scalable Differential Privacy Mechanisms for Real-Time Machine Learning Applications [0.0]
Large language models (LLMs) are increasingly integrated into real-time machine learning applications, where safeguarding user privacy is paramount.
Traditional differential privacy mechanisms often struggle to balance privacy and accuracy, particularly in fast-changing environments with continuously flowing data.
We introduce Scalable Differential Privacy (SDP), a framework tailored for real-time machine learning that emphasizes both robust privacy guarantees and enhanced model performance.
arXiv Detail & Related papers (2024-09-16T20:52:04Z) - Provable Privacy with Non-Private Pre-Processing [56.770023668379615]
We propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms.
Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions.
arXiv Detail & Related papers (2024-03-19T17:54:49Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Privacy Side Channels in Machine Learning Systems [87.53240071195168]
We introduce privacy side channels: attacks that exploit system-level components to extract private information.
For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees.
We further show that systems which block language models from regenerating training data can be exploited to exfiltrate private keys contained in the training set.
arXiv Detail & Related papers (2023-09-11T16:49:05Z) - Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving
Training Data Release for Machine Learning [3.29354893777827]
We introduce a data release framework, 3A (Approximate, Adapt, Anonymize), to maximize data utility for machine learning.
We present experimental evidence showing minimal discrepancy between performance metrics of models trained on real versus privatized datasets.
arXiv Detail & Related papers (2023-07-04T18:37:11Z) - Tight Auditing of Differentially Private Machine Learning [77.38590306275877]
For private machine learning, existing auditing mechanisms are tight.
They only give tight estimates under implausible worst-case assumptions.
We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets.
arXiv Detail & Related papers (2023-02-15T21:40:33Z) - A Survey on Differential Privacy with Machine Learning and Future
Outlook [0.0]
differential privacy is used to protect machine learning models from any attacks and vulnerabilities.
This survey paper presents different differentially private machine learning algorithms categorized into two main categories.
arXiv Detail & Related papers (2022-11-19T14:20:53Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - A General Framework for Auditing Differentially Private Machine Learning [27.99806936918949]
We present a framework to statistically audit the privacy guarantee conferred by a differentially private machine learner in practice.
Our work develops a general methodology to empirically evaluate the privacy of differentially private machine learning implementations.
arXiv Detail & Related papers (2022-10-16T21:34:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.