Synthetic Data: Revisiting the Privacy-Utility Trade-off
- URL: http://arxiv.org/abs/2407.07926v1
- Date: Tue, 9 Jul 2024 14:48:43 GMT
- Title: Synthetic Data: Revisiting the Privacy-Utility Trade-off
- Authors: Fatima Jahan Sarmin, Atiquer Rahman Sarkar, Yang Wang, Noman Mohammed,
- Abstract summary: An article stated that synthetic data does not provide a better trade-off between privacy and utility than traditional anonymization techniques.
The article also claims to have identified a breach in the differential privacy guarantees provided by PATEGAN and PrivBayes.
We analyzed the implementation of the privacy game described in the article and found that it operated in a highly specialized and constrained environment.
- Score: 4.832355454351479
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Synthetic data has been considered a better privacy-preserving alternative to traditionally sanitized data across various applications. However, a recent article challenges this notion, stating that synthetic data does not provide a better trade-off between privacy and utility than traditional anonymization techniques, and that it leads to unpredictable utility loss and highly unpredictable privacy gain. The article also claims to have identified a breach in the differential privacy guarantees provided by PATEGAN and PrivBayes. When a study claims to refute or invalidate prior findings, it is crucial to verify and validate the study. In our work, we analyzed the implementation of the privacy game described in the article and found that it operated in a highly specialized and constrained environment, which limits the applicability of its findings to general cases. Our exploration also revealed that the game did not satisfy a crucial precondition concerning data distributions, which contributed to the perceived violation of the differential privacy guarantees offered by PATEGAN and PrivBayes. We also conducted a privacy-utility trade-off analysis in a more general and unconstrained environment. Our experimentation demonstrated that synthetic data achieves a more favorable privacy-utility trade-off compared to the provided implementation of k-anonymization, thereby reaffirming earlier conclusions.
Related papers
- Convergent Differential Privacy Analysis for General Federated Learning: the $f$-DP Perspective [57.35402286842029]
Federated learning (FL) is an efficient collaborative training paradigm with a focus on local privacy.
differential privacy (DP) is a classical approach to capture and ensure the reliability of private protections.
arXiv Detail & Related papers (2024-08-28T08:22:21Z) - An applied Perspective: Estimating the Differential Identifiability Risk of an Exemplary SOEP Data Set [2.66269503676104]
We show how to compute the risk metric efficiently for a set of basic statistical queries.
Our empirical analysis based on an extensive, real-world scientific data set expands the knowledge on how to compute risks under realistic conditions.
arXiv Detail & Related papers (2024-07-04T17:50:55Z) - Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy.
Within our study, we conducted expert interviews to gain insights into practices in the field.
We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z) - A Summary of Privacy-Preserving Data Publishing in the Local Setting [0.6749750044497732]
Statistical Disclosure Control aims to minimize the risk of exposing confidential information by de-identifying it.
We outline the current privacy-preserving techniques employed in microdata de-identification, delve into privacy measures tailored for various disclosure scenarios, and assess metrics for information loss and predictive performance.
arXiv Detail & Related papers (2023-12-19T04:23:23Z) - Practical considerations on using private sampling for synthetic data [1.3654846342364308]
Differential privacy for synthetic data generation has received much attention due to the ability of preserving privacy while freely using the synthetic data.
Private sampling is the first noise-free method to construct differentially private synthetic data with rigorous bounds for privacy and accuracy.
We provide an implementation of the private sampling algorithm and discuss the realism of its constraints in practical cases.
arXiv Detail & Related papers (2023-12-12T10:20:04Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Breaking the Communication-Privacy-Accuracy Tradeoff with
$f$-Differential Privacy [51.11280118806893]
We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability.
We study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP)
More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms.
arXiv Detail & Related papers (2023-02-19T16:58:53Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - No Free Lunch in "Privacy for Free: How does Dataset Condensation Help
Privacy" [75.98836424725437]
New methods designed to preserve data privacy require careful scrutiny.
Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a privacy-preserving'' method is attacked.
arXiv Detail & Related papers (2022-09-29T17:50:23Z) - Causally Constrained Data Synthesis for Private Data Release [36.80484740314504]
Using synthetic data which reflects certain statistical properties of the original data preserves the privacy of the original data.
Prior works utilize differentially private data release mechanisms to provide formal privacy guarantees.
We propose incorporating causal information into the training process to favorably modify the aforementioned trade-off.
arXiv Detail & Related papers (2021-05-27T13:46:57Z) - On the Privacy-Utility Tradeoff in Peer-Review Data Analysis [34.0435377376779]
A major impediment to research on improving peer review is the unavailability of peer-review data.
We propose a framework for privacy-preserving release of certain conference peer-review data.
arXiv Detail & Related papers (2020-06-29T21:08:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.