TAPAS: a Toolbox for Adversarial Privacy Auditing of Synthetic Data
- URL: http://arxiv.org/abs/2211.06550v1
- Date: Sat, 12 Nov 2022 02:26:54 GMT
- Title: TAPAS: a Toolbox for Adversarial Privacy Auditing of Synthetic Data
- Authors: Florimond Houssiau, James Jordon, Samuel N. Cohen, Owen Daniel, Andrew
Elliott, James Geddes, Callum Mole, Camila Rangel-Smith, Lukasz Szpruch
- Abstract summary: We present TAPAS, a toolbox of attacks to evaluate synthetic data privacy under a wide range of scenarios.
These attacks include generalizations of prior works and novel attacks.
We also introduce a general framework for reasoning about privacy threats to synthetic data and showcase TAPAS on several examples.
- Score: 12.541414528872544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personal data collected at scale promises to improve decision-making and
accelerate innovation. However, sharing and using such data raises serious
privacy concerns. A promising solution is to produce synthetic data, artificial
records to share instead of real data. Since synthetic records are not linked
to real persons, this intuitively prevents classical re-identification attacks.
However, this is insufficient to protect privacy. We here present TAPAS, a
toolbox of attacks to evaluate synthetic data privacy under a wide range of
scenarios. These attacks include generalizations of prior works and novel
attacks. We also introduce a general framework for reasoning about privacy
threats to synthetic data and showcase TAPAS on several examples.
Related papers
- Synthetic Data Outliers: Navigating Identity Disclosure [3.8811062755861956]
We analyze the privacy of synthetic data w.r.t the outliers.
Our main findings suggest that outliers re-identification via linkage attack is feasible and easily achieved.
Additional safeguards such as differential privacy can prevent re-identification, albeit at the expense of the data utility.
arXiv Detail & Related papers (2024-06-04T19:35:44Z) - On the Inadequacy of Similarity-based Privacy Metrics: Reconstruction
Attacks against "Truly Anonymous Synthetic Data'' [15.0393231456773]
We review the privacy metrics offered by leading companies in this space and shed light on a few critical flaws in reasoning about privacy entirely via empirical evaluations.
We present a reconstruction attack, ReconSyn, which successfully recovers (i.e., leaks all attributes of) at least 78% of the low-density train records (or outliers) with only black-box access to a single fitted generative model and the privacy metrics.
arXiv Detail & Related papers (2023-12-08T15:42:28Z) - Achilles' Heels: Vulnerable Record Identification in Synthetic Data
Publishing [9.061271587514215]
We propose a principled vulnerable record identification technique for synthetic data publishing.
We show it to strongly outperform previous ad-hoc methods across datasets and generators.
We show it to accurately identify vulnerable records when synthetic data generators are made differentially private.
arXiv Detail & Related papers (2023-06-17T09:42:46Z) - Towards Generalizable Data Protection With Transferable Unlearnable
Examples [50.628011208660645]
We present a novel, generalizable data protection method by generating transferable unlearnable examples.
To the best of our knowledge, this is the first solution that examines data privacy from the perspective of data distribution.
arXiv Detail & Related papers (2023-05-18T04:17:01Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - A Linear Reconstruction Approach for Attribute Inference Attacks against Synthetic Data [1.5293427903448022]
We introduce a new attribute inference attack against synthetic data.
We show that our attack can be highly accurate even on arbitrary records.
We then evaluate the tradeoff between protecting privacy and preserving statistical utility.
arXiv Detail & Related papers (2023-01-24T14:56:36Z) - The Privacy Onion Effect: Memorization is Relative [76.46529413546725]
We show an Onion Effect of memorization: removing the "layer" of outlier points that are most vulnerable exposes a new layer of previously-safe points to the same attack.
It suggests that privacy-enhancing technologies such as machine unlearning could actually harm the privacy of other users.
arXiv Detail & Related papers (2022-06-21T15:25:56Z) - Defending against Reconstruction Attacks with R\'enyi Differential
Privacy [72.1188520352079]
Reconstruction attacks allow an adversary to regenerate data samples of the training set using access to only a trained model.
Differential privacy is a known solution to such attacks, but is often used with a relatively large privacy budget.
We show that, for a same mechanism, we can derive privacy guarantees for reconstruction attacks that are better than the traditional ones from the literature.
arXiv Detail & Related papers (2022-02-15T18:09:30Z) - Preventing Unauthorized Use of Proprietary Data: Poisoning for Secure
Dataset Release [52.504589728136615]
We develop a data poisoning method by which publicly released data can be minimally modified to prevent others from train-ing models on it.
We demonstrate the success of our approach onImageNet classification and on facial recognition.
arXiv Detail & Related papers (2021-02-16T19:12:34Z) - Privacy and Robustness in Federated Learning: Attacks and Defenses [74.62641494122988]
We conduct the first comprehensive survey on this topic.
Through a concise introduction to the concept of FL, and a unique taxonomy covering: 1) threat models; 2) poisoning attacks and defenses against robustness; 3) inference attacks and defenses against privacy, we provide an accessible review of this important topic.
arXiv Detail & Related papers (2020-12-07T12:11:45Z) - Synthetic Data -- Anonymisation Groundhog Day [4.694549066382216]
We present the first quantitative evaluation of the privacy gain of synthetic data publishing.
We show that synthetic data does not prevent inference attacks or does not retain data utility.
In contrast to traditional anonymisation, the privacy-utility tradeoff of synthetic data publishing is hard to predict.
arXiv Detail & Related papers (2020-11-13T16:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.