Differentially Private Federated Learning of Diffusion Models for Synthetic Tabular Data Generation
- URL: http://arxiv.org/abs/2412.16083v1
- Date: Fri, 20 Dec 2024 17:30:58 GMT
- Title: Differentially Private Federated Learning of Diffusion Models for Synthetic Tabular Data Generation
- Authors: Timur Sattarov, Marco Schreyer, Damian Borth,
- Abstract summary: We introduce DP-Fed-FinDiff framework, a novel integration of Differential Privacy, Federated Learning and Denoising Diffusion Probabilistic Models.
We demonstrate the effectiveness of DP-Fed-FinDiff on multiple real-world financial datasets.
The results affirm the potential of DP-Fed-FinDiff to enable secure data sharing and robust analytics in highly regulated domains.
- Score: 5.182014186927255
- License:
- Abstract: The increasing demand for privacy-preserving data analytics in finance necessitates solutions for synthetic data generation that rigorously uphold privacy standards. We introduce DP-Fed-FinDiff framework, a novel integration of Differential Privacy, Federated Learning and Denoising Diffusion Probabilistic Models designed to generate high-fidelity synthetic tabular data. This framework ensures compliance with stringent privacy regulations while maintaining data utility. We demonstrate the effectiveness of DP-Fed-FinDiff on multiple real-world financial datasets, achieving significant improvements in privacy guarantees without compromising data quality. Our empirical evaluations reveal the optimal trade-offs between privacy budgets, client configurations, and federated optimization strategies. The results affirm the potential of DP-Fed-FinDiff to enable secure data sharing and robust analytics in highly regulated domains, paving the way for further advances in federated learning and privacy-preserving data synthesis.
Related papers
- SafeSynthDP: Leveraging Large Language Models for Privacy-Preserving Synthetic Data Generation Using Differential Privacy [0.0]
We investigate capability of Large Language Models (Ms) to generate synthetic datasets with Differential Privacy (DP) mechanisms.
Our approach incorporates DP-based noise injection methods, including Laplace and Gaussian distributions, into the data generation process.
We then evaluate the utility of these DP-enhanced synthetic datasets by comparing the performance of ML models trained on them against models trained on the original data.
arXiv Detail & Related papers (2024-12-30T01:10:10Z) - DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized Mixing [0.8739101659113155]
We introduce an effective data publishing algorithm emphDP-CDA.
Our proposed algorithm generates synthetic datasets by randomly mixing data in a class-specific manner, and inducing carefully-tuned randomness to ensure privacy guarantees.
Our results indicate that synthetic datasets produced using the DP-CDA can achieve superior utility compared to those generated by traditional data publishing algorithms, even when subject to the same privacy requirements.
arXiv Detail & Related papers (2024-11-25T06:14:06Z) - SAFES: Sequential Privacy and Fairness Enhancing Data Synthesis for Responsible AI [3.0445044300235535]
We introduce SAFES, a Sequential PrivAcy and Fairness Enhancing data Synthesis procedure.
For reasonable privacy loss, SAFES-generated synthetic data achieve significantly improved fairness metrics with relatively low utility loss.
arXiv Detail & Related papers (2024-11-14T04:36:12Z) - Convergent Differential Privacy Analysis for General Federated Learning: the $f$-DP Perspective [57.35402286842029]
Federated learning (FL) is an efficient collaborative training paradigm with a focus on local privacy.
differential privacy (DP) is a classical approach to capture and ensure the reliability of private protections.
arXiv Detail & Related papers (2024-08-28T08:22:21Z) - Synergizing Privacy and Utility in Data Analytics Through Advanced Information Theorization [2.28438857884398]
We introduce three sophisticated algorithms: a Noise-Infusion Technique tailored for high-dimensional image data, a Variational Autoencoder (VAE) for robust feature extraction and an Expectation Maximization (EM) approach optimized for structured data privacy.
Our methods significantly reduce mutual information between sensitive attributes and transformed data, thereby enhancing privacy.
The research contributes to the field by providing a flexible and effective strategy for deploying privacy-preserving algorithms across various data types.
arXiv Detail & Related papers (2024-04-24T22:58:42Z) - FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - On the Inherent Privacy Properties of Discrete Denoising Diffusion Models [17.773335593043004]
We present the pioneering theoretical exploration of the privacy preservation inherent in discrete diffusion models.
Our framework elucidates the potential privacy leakage for each data point in a given training dataset.
Our bounds also show that training with $s$-sized data points leads to a surge in privacy leakage.
arXiv Detail & Related papers (2023-10-24T05:07:31Z) - PS-FedGAN: An Efficient Federated Learning Framework Based on Partially
Shared Generative Adversarial Networks For Data Privacy [56.347786940414935]
Federated Learning (FL) has emerged as an effective learning paradigm for distributed computation.
This work proposes a novel FL framework that requires only partial GAN model sharing.
Named as PS-FedGAN, this new framework enhances the GAN releasing and training mechanism to address heterogeneous data distributions.
arXiv Detail & Related papers (2023-05-19T05:39:40Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.