TAPAS: a Toolbox for Adversarial Privacy Auditing of Synthetic Data
- URL: http://arxiv.org/abs/2211.06550v1
- Date: Sat, 12 Nov 2022 02:26:54 GMT
- Title: TAPAS: a Toolbox for Adversarial Privacy Auditing of Synthetic Data
- Authors: Florimond Houssiau, James Jordon, Samuel N. Cohen, Owen Daniel, Andrew
Elliott, James Geddes, Callum Mole, Camila Rangel-Smith, Lukasz Szpruch
- Abstract summary: We present TAPAS, a toolbox of attacks to evaluate synthetic data privacy under a wide range of scenarios.
These attacks include generalizations of prior works and novel attacks.
We also introduce a general framework for reasoning about privacy threats to synthetic data and showcase TAPAS on several examples.
- Score: 12.541414528872544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personal data collected at scale promises to improve decision-making and
accelerate innovation. However, sharing and using such data raises serious
privacy concerns. A promising solution is to produce synthetic data, artificial
records to share instead of real data. Since synthetic records are not linked
to real persons, this intuitively prevents classical re-identification attacks.
However, this is insufficient to protect privacy. We here present TAPAS, a
toolbox of attacks to evaluate synthetic data privacy under a wide range of
scenarios. These attacks include generalizations of prior works and novel
attacks. We also introduce a general framework for reasoning about privacy
threats to synthetic data and showcase TAPAS on several examples.
Related papers
- Differentially Private Synthetic Data Release for Topics API Outputs [63.79476766779742]
We focus on one Privacy-Preserving Ads API: the Topics API, part of Google Chrome's Privacy Sandbox.<n>We generate a differentially-private dataset that closely matches the re-identification risk properties of the real Topics API data.<n>We hope this will enable external researchers to analyze the API in-depth and replicate prior and future work on a realistic large-scale dataset.
arXiv Detail & Related papers (2025-06-30T13:46:57Z) - The Data Sharing Paradox of Synthetic Data in Healthcare [9.66493160220239]
This article discusses the paradoxical situation where synthetic data is designed for data sharing but is often still restricted.
We discuss how the field should move forward to mitigate this issue.
arXiv Detail & Related papers (2025-03-26T16:06:40Z) - Synthetic Data Privacy Metrics [2.1213500139850017]
We review the pros and cons of popular metrics that include simulations of adversarial attacks.
We also review current best practices for amending generative models to enhance the privacy of the data they create.
arXiv Detail & Related papers (2025-01-07T17:02:33Z) - Synthetic Data Outliers: Navigating Identity Disclosure [3.8811062755861956]
We analyze the privacy of synthetic data w.r.t the outliers.
Our main findings suggest that outliers re-identification via linkage attack is feasible and easily achieved.
Additional safeguards such as differential privacy can prevent re-identification, albeit at the expense of the data utility.
arXiv Detail & Related papers (2024-06-04T19:35:44Z) - The Inadequacy of Similarity-based Privacy Metrics: Privacy Attacks against "Truly Anonymous" Synthetic Datasets [12.730435519914415]
We examine the privacy metrics used in real-world synthetic data deployments and demonstrate their unreliability in several ways.
We introduce ReconSyn, a reconstruction attack that generates multiple synthetic datasets that are considered private by the metrics but actually leak unique information to individual records.
We show that ReconSyn recovers 78-100% of the outliers in the train data with only black-box access to a single fitted generative model and the privacy metrics.
arXiv Detail & Related papers (2023-12-08T15:42:28Z) - Achilles' Heels: Vulnerable Record Identification in Synthetic Data
Publishing [9.061271587514215]
We propose a principled vulnerable record identification technique for synthetic data publishing.
We show it to strongly outperform previous ad-hoc methods across datasets and generators.
We show it to accurately identify vulnerable records when synthetic data generators are made differentially private.
arXiv Detail & Related papers (2023-06-17T09:42:46Z) - Towards Generalizable Data Protection With Transferable Unlearnable
Examples [50.628011208660645]
We present a novel, generalizable data protection method by generating transferable unlearnable examples.
To the best of our knowledge, this is the first solution that examines data privacy from the perspective of data distribution.
arXiv Detail & Related papers (2023-05-18T04:17:01Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - A Linear Reconstruction Approach for Attribute Inference Attacks against Synthetic Data [1.5293427903448022]
We introduce a new attribute inference attack against synthetic data.
We show that our attack can be highly accurate even on arbitrary records.
We then evaluate the tradeoff between protecting privacy and preserving statistical utility.
arXiv Detail & Related papers (2023-01-24T14:56:36Z) - The Privacy Onion Effect: Memorization is Relative [76.46529413546725]
We show an Onion Effect of memorization: removing the "layer" of outlier points that are most vulnerable exposes a new layer of previously-safe points to the same attack.
It suggests that privacy-enhancing technologies such as machine unlearning could actually harm the privacy of other users.
arXiv Detail & Related papers (2022-06-21T15:25:56Z) - Defending against Reconstruction Attacks with R\'enyi Differential
Privacy [72.1188520352079]
Reconstruction attacks allow an adversary to regenerate data samples of the training set using access to only a trained model.
Differential privacy is a known solution to such attacks, but is often used with a relatively large privacy budget.
We show that, for a same mechanism, we can derive privacy guarantees for reconstruction attacks that are better than the traditional ones from the literature.
arXiv Detail & Related papers (2022-02-15T18:09:30Z) - Preventing Unauthorized Use of Proprietary Data: Poisoning for Secure
Dataset Release [52.504589728136615]
We develop a data poisoning method by which publicly released data can be minimally modified to prevent others from train-ing models on it.
We demonstrate the success of our approach onImageNet classification and on facial recognition.
arXiv Detail & Related papers (2021-02-16T19:12:34Z) - Privacy and Robustness in Federated Learning: Attacks and Defenses [74.62641494122988]
We conduct the first comprehensive survey on this topic.
Through a concise introduction to the concept of FL, and a unique taxonomy covering: 1) threat models; 2) poisoning attacks and defenses against robustness; 3) inference attacks and defenses against privacy, we provide an accessible review of this important topic.
arXiv Detail & Related papers (2020-12-07T12:11:45Z) - Synthetic Data -- Anonymisation Groundhog Day [4.694549066382216]
We present the first quantitative evaluation of the privacy gain of synthetic data publishing.
We show that synthetic data does not prevent inference attacks or does not retain data utility.
In contrast to traditional anonymisation, the privacy-utility tradeoff of synthetic data publishing is hard to predict.
arXiv Detail & Related papers (2020-11-13T16:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.