A Cautionary Tale: On the Role of Reference Data in Empirical Privacy
Defenses
- URL: http://arxiv.org/abs/2310.12112v1
- Date: Wed, 18 Oct 2023 17:07:07 GMT
- Title: A Cautionary Tale: On the Role of Reference Data in Empirical Privacy
Defenses
- Authors: Caelin G. Kaplan, Chuan Xu, Othmane Marfoq, Giovanni Neglia, Anderson
Santana de Oliveira
- Abstract summary: We propose a baseline defense that enables the utility-privacy tradeoff with respect to both training and reference data to be easily understood.
Our experiments show that, surprisingly, it outperforms the most well-studied and current state-of-the-art empirical privacy defenses.
- Score: 12.34501903200183
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Within the realm of privacy-preserving machine learning, empirical privacy
defenses have been proposed as a solution to achieve satisfactory levels of
training data privacy without a significant drop in model utility. Most
existing defenses against membership inference attacks assume access to
reference data, defined as an additional dataset coming from the same (or a
similar) underlying distribution as training data. Despite the common use of
reference data, previous works are notably reticent about defining and
evaluating reference data privacy. As gains in model utility and/or training
data privacy may come at the expense of reference data privacy, it is essential
that all three aspects are duly considered. In this paper, we first examine the
availability of reference data and its privacy treatment in previous works and
demonstrate its necessity for fairly comparing defenses. Second, we propose a
baseline defense that enables the utility-privacy tradeoff with respect to both
training and reference data to be easily understood. Our method is formulated
as an empirical risk minimization with a constraint on the generalization
error, which, in practice, can be evaluated as a weighted empirical risk
minimization (WERM) over the training and reference datasets. Although we
conceived of WERM as a simple baseline, our experiments show that,
surprisingly, it outperforms the most well-studied and current state-of-the-art
empirical privacy defenses using reference data for nearly all relative privacy
levels of reference and training data. Our investigation also reveals that
these existing methods are unable to effectively trade off reference data
privacy for model utility and/or training data privacy. Overall, our work
highlights the need for a proper evaluation of the triad model utility /
training data privacy / reference data privacy when comparing privacy defenses.
Related papers
- Empirical Privacy Evaluations of Generative and Predictive Machine Learning Models -- A review and challenges for practice [0.3069335774032178]
It is crucial to empirically assess the privacy risks associated with the generated synthetic data before deploying generative technologies.
This paper outlines the key concepts and assumptions underlying empirical privacy evaluation in machine learning-based generative and predictive models.
arXiv Detail & Related papers (2024-11-19T12:19:28Z) - Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - FT-PrivacyScore: Personalized Privacy Scoring Service for Machine Learning Participation [4.772368796656325]
In practice, controlled data access remains a mainstream method for protecting data privacy in many industrial and research environments.
We developed the demo prototype FT-PrivacyScore to show that it's possible to efficiently and quantitatively estimate the privacy risk of participating in a model fine-tuning task.
arXiv Detail & Related papers (2024-10-30T02:41:26Z) - Towards Split Learning-based Privacy-Preserving Record Linkage [49.1574468325115]
Split Learning has been introduced to facilitate applications where user data privacy is a requirement.
In this paper, we investigate the potentials of Split Learning for Privacy-Preserving Record Matching.
arXiv Detail & Related papers (2024-09-02T09:17:05Z) - Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy.
Within our study, we conducted expert interviews to gain insights into practices in the field.
We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z) - $\alpha$-Mutual Information: A Tunable Privacy Measure for Privacy
Protection in Data Sharing [4.475091558538915]
This paper adopts Arimoto's $alpha$-Mutual Information as a tunable privacy measure.
We formulate a general distortion-based mechanism that manipulates the original data to offer privacy protection.
arXiv Detail & Related papers (2023-10-27T16:26:14Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining [75.25943383604266]
We question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving.
We caution that publicizing these models pretrained on Web data as "private" could lead to harm and erode the public's trust in differential privacy as a meaningful definition of privacy.
We conclude by discussing potential paths forward for the field of private learning, as public pretraining becomes more popular and powerful.
arXiv Detail & Related papers (2022-12-13T10:41:12Z) - No Free Lunch in "Privacy for Free: How does Dataset Condensation Help
Privacy" [75.98836424725437]
New methods designed to preserve data privacy require careful scrutiny.
Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a privacy-preserving'' method is attacked.
arXiv Detail & Related papers (2022-09-29T17:50:23Z) - Personalized PATE: Differential Privacy for Machine Learning with
Individual Privacy Guarantees [1.2691047660244335]
We propose three novel methods to support training an ML model with different personalized privacy guarantees within the training data.
Our experiments show that our personalized privacy methods yield higher accuracy models than the non-personalized baseline.
arXiv Detail & Related papers (2022-02-21T20:16:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.