On the Inadequacy of Similarity-based Privacy Metrics: Reconstruction
Attacks against "Truly Anonymous Synthetic Data''
- URL: http://arxiv.org/abs/2312.05114v1
- Date: Fri, 8 Dec 2023 15:42:28 GMT
- Title: On the Inadequacy of Similarity-based Privacy Metrics: Reconstruction
Attacks against "Truly Anonymous Synthetic Data''
- Authors: Georgi Ganev and Emiliano De Cristofaro
- Abstract summary: We review the privacy metrics offered by leading companies in this space and shed light on a few critical flaws in reasoning about privacy entirely via empirical evaluations.
We present a reconstruction attack, ReconSyn, which successfully recovers (i.e., leaks all attributes of) at least 78% of the low-density train records (or outliers) with only black-box access to a single fitted generative model and the privacy metrics.
- Score: 15.0393231456773
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training generative models to produce synthetic data is meant to provide a
privacy-friendly approach to data release. However, we get robust guarantees
only when models are trained to satisfy Differential Privacy (DP). Alas, this
is not the standard in industry as many companies use ad-hoc strategies to
empirically evaluate privacy based on the statistical similarity between
synthetic and real data. In this paper, we review the privacy metrics offered
by leading companies in this space and shed light on a few critical flaws in
reasoning about privacy entirely via empirical evaluations. We analyze the
undesirable properties of the most popular metrics and filters and demonstrate
their unreliability and inconsistency through counter-examples. We then present
a reconstruction attack, ReconSyn, which successfully recovers (i.e., leaks all
attributes of) at least 78% of the low-density train records (or outliers) with
only black-box access to a single fitted generative model and the privacy
metrics. Finally, we show that applying DP only to the model or using
low-utility generators does not mitigate ReconSyn as the privacy leakage
predominantly comes from the metrics. Overall, our work serves as a warning to
practitioners not to deviate from established privacy-preserving mechanisms.
Related papers
- Defining 'Good': Evaluation Framework for Synthetic Smart Meter Data [14.779917834583577]
We show that standard privacy attack methods are inadequate for assessing privacy risks of smart meter datasets.
We propose an improved method by injecting training data with implausible outliers, then launching privacy attacks directly on these outliers.
arXiv Detail & Related papers (2024-07-16T14:41:27Z) - Achilles' Heels: Vulnerable Record Identification in Synthetic Data
Publishing [9.061271587514215]
We propose a principled vulnerable record identification technique for synthetic data publishing.
We show it to strongly outperform previous ad-hoc methods across datasets and generators.
We show it to accurately identify vulnerable records when synthetic data generators are made differentially private.
arXiv Detail & Related papers (2023-06-17T09:42:46Z) - Auditing and Generating Synthetic Data with Controllable Trust Trade-offs [54.262044436203965]
We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models.
It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation.
We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases.
arXiv Detail & Related papers (2023-04-21T09:03:18Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Synthetic Text Generation with Differential Privacy: A Simple and
Practical Recipe [32.63295550058343]
We show that a simple and practical recipe in the text domain is effective in generating useful synthetic text with strong privacy protection.
Our method produces synthetic text that is competitive in terms of utility with its non-private counterpart.
arXiv Detail & Related papers (2022-10-25T21:21:17Z) - No Free Lunch in "Privacy for Free: How does Dataset Condensation Help
Privacy" [75.98836424725437]
New methods designed to preserve data privacy require careful scrutiny.
Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a privacy-preserving'' method is attacked.
arXiv Detail & Related papers (2022-09-29T17:50:23Z) - Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets.
In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - Defending against Reconstruction Attacks with R\'enyi Differential
Privacy [72.1188520352079]
Reconstruction attacks allow an adversary to regenerate data samples of the training set using access to only a trained model.
Differential privacy is a known solution to such attacks, but is often used with a relatively large privacy budget.
We show that, for a same mechanism, we can derive privacy guarantees for reconstruction attacks that are better than the traditional ones from the literature.
arXiv Detail & Related papers (2022-02-15T18:09:30Z) - PEARL: Data Synthesis via Private Embeddings and Adversarial
Reconstruction Learning [1.8692254863855962]
We propose a new framework of data using deep generative models in a differentially private manner.
Within our framework, sensitive data are sanitized with rigorous privacy guarantees in a one-shot fashion.
Our proposal has theoretical guarantees of performance, and empirical evaluations on multiple datasets show that our approach outperforms other methods at reasonable levels of privacy.
arXiv Detail & Related papers (2021-06-08T18:00:01Z) - Causally Constrained Data Synthesis for Private Data Release [36.80484740314504]
Using synthetic data which reflects certain statistical properties of the original data preserves the privacy of the original data.
Prior works utilize differentially private data release mechanisms to provide formal privacy guarantees.
We propose incorporating causal information into the training process to favorably modify the aforementioned trade-off.
arXiv Detail & Related papers (2021-05-27T13:46:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.