Differentially Private Non Parametric Copulas: Generating synthetic data with non parametric copulas under privacy guarantees
- URL: http://arxiv.org/abs/2409.18611v1
- Date: Fri, 27 Sep 2024 10:18:14 GMT
- Title: Differentially Private Non Parametric Copulas: Generating synthetic data with non parametric copulas under privacy guarantees
- Authors: Pablo A. Osorio-Marulanda, John Esteban Castro Ramirez, Mikel Hernández Jiménez, Nicolas Moreno Reyes, Gorka Epelde Unanue,
- Abstract summary: This work focuses on enhancing a non-parametric co-pula-based synthetic data generation model, DPNPC, by incorporating Differential Privacy.
We compare DPNPC with three other models (PrivBayes, DP-Copula, and DP-Histogram) across three public datasets, evaluating privacy, utility, and execution time.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Creation of synthetic data models has represented a significant advancement across diverse scientific fields, but this technology also brings important privacy considerations for users. This work focuses on enhancing a non-parametric copula-based synthetic data generation model, DPNPC, by incorporating Differential Privacy through an Enhanced Fourier Perturbation method. The model generates synthetic data for mixed tabular databases while preserving privacy. We compare DPNPC with three other models (PrivBayes, DP-Copula, and DP-Histogram) across three public datasets, evaluating privacy, utility, and execution time. DPNPC outperforms others in modeling multivariate dependencies, maintaining privacy for small $\epsilon$ values, and reducing training times. However, limitations include the need to assess the model's performance with different encoding methods and consider additional privacy attacks. Future research should address these areas to enhance privacy-preserving synthetic data generation.
Related papers
- Differentially Private Fine-Tuning of Diffusion Models [22.454127503937883]
The integration of Differential Privacy with diffusion models (DMs) presents a promising yet challenging frontier.
Recent developments in this field have highlighted the potential for generating high-quality synthetic data by pre-training on public data.
We propose a strategy optimized for private diffusion models, which minimizes the number of trainable parameters to enhance the privacy-utility trade-off.
arXiv Detail & Related papers (2024-06-03T14:18:04Z) - Quantifying and Mitigating Privacy Risks for Tabular Generative Models [13.153278585144355]
Synthetic data from generative models emerges as the privacy-preserving data-sharing solution.
We propose DP-TLDM, Differentially Private Tabular Latent Diffusion Model.
We show that DP-TLDM improves the synthetic quality by an average of 35% in data resemblance, 15% in the utility for downstream tasks, and 50% in data discriminability.
arXiv Detail & Related papers (2024-03-12T17:27:49Z) - FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - Scaling While Privacy Preserving: A Comprehensive Synthetic Tabular Data
Generation and Evaluation in Learning Analytics [0.412484724941528]
Privacy poses a significant obstacle to the progress of learning analytics (LA), presenting challenges like inadequate anonymization and data misuse.
Synthetic data emerges as a potential remedy, offering robust privacy protection.
Prior LA research on synthetic data lacks thorough evaluation, essential for assessing the delicate balance between privacy and data utility.
arXiv Detail & Related papers (2024-01-12T20:27:55Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - Don't Generate Me: Training Differentially Private Generative Models
with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy.
Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z) - Differentially Private Synthetic Medical Data Generation using
Convolutional GANs [7.2372051099165065]
We develop a differentially private framework for synthetic data generation using R'enyi differential privacy.
Our approach builds on convolutional autoencoders and convolutional generative adversarial networks to preserve some of the critical characteristics of the generated synthetic data.
We demonstrate that our model outperforms existing state-of-the-art models under the same privacy budget.
arXiv Detail & Related papers (2020-12-22T01:03:49Z) - P3GM: Private High-Dimensional Data Release via Privacy Preserving
Phased Generative Model [23.91327154831855]
This paper proposes privacy-preserving phased generative model (P3GM) for releasing sensitive data.
P3GM employs the two-phase learning process to make it robust against the noise, and to increase learning efficiency.
Compared with the state-of-the-art methods, our generated samples look fewer noises and closer to the original data in terms of data diversity.
arXiv Detail & Related papers (2020-06-22T09:47:54Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.