Membership Inference Attacks against Synthetic Data through Overfitting
Detection
- URL: http://arxiv.org/abs/2302.12580v1
- Date: Fri, 24 Feb 2023 11:27:39 GMT
- Title: Membership Inference Attacks against Synthetic Data through Overfitting
Detection
- Authors: Boris van Breugel, Hao Sun, Zhaozhi Qian, Mihaela van der Schaar
- Abstract summary: We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
- Score: 84.02632160692995
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data is the foundation of most science. Unfortunately, sharing data can be
obstructed by the risk of violating data privacy, impeding research in fields
like healthcare. Synthetic data is a potential solution. It aims to generate
data that has the same distribution as the original data, but that does not
disclose information about individuals. Membership Inference Attacks (MIAs) are
a common privacy attack, in which the attacker attempts to determine whether a
particular real sample was used for training of the model. Previous works that
propose MIAs against generative models either display low performance -- giving
the false impression that data is highly private -- or need to assume access to
internal generative model parameters -- a relatively low-risk scenario, as the
data publisher often only releases synthetic data, not the model. In this work
we argue for a realistic MIA setting that assumes the attacker has some
knowledge of the underlying data distribution. We propose DOMIAS, a
density-based MIA model that aims to infer membership by targeting local
overfitting of the generative model. Experimentally we show that DOMIAS is
significantly more successful at MIA than previous work, especially at
attacking uncommon samples. The latter is disconcerting since these samples may
correspond to underrepresented groups. We also demonstrate how DOMIAS' MIA
performance score provides an interpretable metric for privacy, giving data
publishers a new tool for achieving the desired privacy-utility trade-off in
their synthetic data.
Related papers
- The Inadequacy of Similarity-based Privacy Metrics: Privacy Attacks against "Truly Anonymous" Synthetic Datasets [12.730435519914415]
We examine the privacy metrics used in real-world synthetic data deployments and demonstrate their unreliability in several ways.
We introduce ReconSyn, a reconstruction attack that generates multiple synthetic datasets that are considered private by the metrics but actually leak unique information to individual records.
We show that ReconSyn recovers 78-100% of the outliers in the train data with only black-box access to a single fitted generative model and the privacy metrics.
arXiv Detail & Related papers (2023-12-08T15:42:28Z) - Preserving Privacy in GANs Against Membership Inference Attack [30.668589815716775]
Generative Adversarial Networks (GANs) have been widely used for generating synthetic data.
Recent works showed that GANs might leak information regarding their training data samples.
This makes GANs vulnerable to Membership Inference Attacks (MIAs)
arXiv Detail & Related papers (2023-11-06T15:04:48Z) - Assessing Privacy Risks in Language Models: A Case Study on
Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack.
We exploit text similarity and the model's resistance to document modifications as potential MI signals.
We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z) - Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access.
We investigate factors influencing the success of model extraction attacks.
Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z) - Achilles' Heels: Vulnerable Record Identification in Synthetic Data
Publishing [9.061271587514215]
We propose a principled vulnerable record identification technique for synthetic data publishing.
We show it to strongly outperform previous ad-hoc methods across datasets and generators.
We show it to accurately identify vulnerable records when synthetic data generators are made differentially private.
arXiv Detail & Related papers (2023-06-17T09:42:46Z) - An Empirical Study on the Membership Inference Attack against Tabular
Data Synthesis Models [12.878704876264317]
Tabular data synthesis models are popular because they can trade-off between data utility and privacy.
Recent research has shown that generative models for image data are susceptible to the membership inference attack.
We conduct experiments to evaluate how well two popular differentially-private deep learning training algorithms, DP-SGD and DP-GAN, can protect the models against the attack.
arXiv Detail & Related papers (2022-08-17T07:09:08Z) - Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process.
We generate a representative as well as fair version of the UCI Adult census data set.
We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z) - Knowledge-Enriched Distributional Model Inversion Attacks [49.43828150561947]
Model inversion (MI) attacks are aimed at reconstructing training data from model parameters.
We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data.
Our experiments show that the combination of these techniques can significantly boost the success rate of the state-of-the-art MI attacks by 150%.
arXiv Detail & Related papers (2020-10-08T16:20:48Z) - How Does Data Augmentation Affect Privacy in Machine Learning? [94.52721115660626]
We propose new MI attacks to utilize the information of augmented data.
We establish the optimal membership inference when the model is trained with augmented data.
arXiv Detail & Related papers (2020-07-21T02:21:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.