Related papers: Understanding the Impact of Data Domain Extraction on Synthetic Data Privacy

Understanding the Impact of Data Domain Extraction on Synthetic Data Privacy

URL: http://arxiv.org/abs/2504.08254v2
Date: Mon, 14 Apr 2025 02:34:24 GMT
Title: Understanding the Impact of Data Domain Extraction on Synthetic Data Privacy
Authors: Georgi Ganev, Meenatchi Sundaram Muthu Selva Annamalai, Sofiane Mahiou, Emiliano De Cristofaro,
Abstract summary: Privacy attacks, particularly membership inference attacks (MIAs), are widely used to assess the privacy of generative models for synthetic data.<n>These attacks often exploit outliers, which are especially vulnerable due to their position at the boundaries of the data domain.<n>This paper examines the role of data domain extraction in generative models and its impact on privacy attacks.
Score: 10.893644207618825
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Privacy attacks, particularly membership inference attacks (MIAs), are widely used to assess the privacy of generative models for tabular synthetic data, including those with Differential Privacy (DP) guarantees. These attacks often exploit outliers, which are especially vulnerable due to their position at the boundaries of the data domain (e.g., at the minimum and maximum values). However, the role of data domain extraction in generative models and its impact on privacy attacks have been overlooked. In this paper, we examine three strategies for defining the data domain: assuming it is externally provided (ideally from public data), extracting it directly from the input data, and extracting it with DP mechanisms. While common in popular implementations and libraries, we show that the second approach breaks end-to-end DP guarantees and leaves models vulnerable. While using a provided domain (if representative) is preferable, extracting it with DP can also defend against popular MIAs, even at high privacy budgets.

Related papers

Beyond the Worst Case: Extending Differential Privacy Guarantees to Realistic Adversaries [17.780319275883127]
Differential Privacy is a family of definitions that bound the worst-case privacy leakage of a mechanism.<n>This work sheds light on what the worst-case guarantee of DP implies about the success of attackers that are more representative of real-world privacy risks.
arXiv Detail & Related papers (2025-07-10T20:36:31Z)
Machine Learning with Privacy for Protected Attributes [56.44253915927481]
We refine the definition of differential privacy (DP) to create a more general and flexible framework that we call feature differential privacy (FDP)<n>Our definition is simulation-based and allows for both addition/removal and replacement variants of privacy, and can handle arbitrary separation of protected and non-protected features.<n>We apply our framework to various machine learning tasks and show that it can significantly improve the utility of DP-trained models when public features are available.
arXiv Detail & Related papers (2025-06-24T17:53:28Z)
Enforcing Demographic Coherence: A Harms Aware Framework for Reasoning about Private Data Release [14.939460540040459]
We introduce demographic coherence, a condition inspired by privacy attacks that we argue is necessary for data privacy.<n>Our framework focuses on confidence rated predictors, which can in turn be distilled from almost any data-informed process.<n>We prove that every differentially private data release is also demographically coherent, and that there are demographically coherent algorithms which are not differentially private.
arXiv Detail & Related papers (2025-02-04T20:42:30Z)
Bayes-Nash Generative Privacy Against Membership Inference Attacks [24.330984323956173]
Membership inference attacks (MIAs) expose significant privacy risks by determining whether an individual's data is in a dataset.<n>We propose a game-theoretic framework that models privacy protection from MIA as a Bayesian game between a defender and an attacker.<n>We call the defender's data sharing policy thereby obtained Bayes-Nash Generative Privacy (BNGP)
arXiv Detail & Related papers (2024-10-09T20:29:04Z)
DP-TLDM: Differentially Private Tabular Latent Diffusion Model [13.153278585144355]
We propose DPTLDM, Differentially Private Tabular Latent Diffusion Model, to keep high data quality and low privacy risk of synthetic data tables.<n>We show that DPTLDM improves the synthetic quality by an average of 35% in data resemblance, 15% in the utility for downstream tasks, and 50% in data discriminability.
arXiv Detail & Related papers (2024-03-12T17:27:49Z)
FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners. FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks. We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z)
Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset. We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z)
Conciliating Privacy and Utility in Data Releases via Individual Differential Privacy and Microaggregation [4.287502453001108]
$epsilon$-Differential privacy (DP) is a well-known privacy model that offers strong privacy guarantees. We propose $epsilon$-individual differential privacy (iDP), which causes less data distortion while providing the same protection as DP to subjects. We report on experiments that show how our approach can provide strong privacy (small $epsilon$) while yielding protected data that do not significantly degrade the accuracy of secondary data analysis.
arXiv Detail & Related papers (2023-12-21T10:23:18Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
Membership Inference Attacks against Synthetic Data through Overfitting Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution. We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z)
DP2-Pub: Differentially Private High-Dimensional Data Publication with Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases. splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget. We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z)
Just Fine-tune Twice: Selective Differential Privacy for Large Language Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models. Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.