CaPS: Collaborative and Private Synthetic Data Generation from Distributed Sources
- URL: http://arxiv.org/abs/2402.08614v2
- Date: Sat, 8 Jun 2024 17:07:35 GMT
- Title: CaPS: Collaborative and Private Synthetic Data Generation from Distributed Sources
- Authors: Sikha Pentyala, Mayana Pereira, Martine De Cock,
- Abstract summary: We propose a framework for the collaborative and private generation of synthetic data from distributed data holders.
We replace the trusted aggregator with secure multi-party computation protocols and output privacy via differential privacy (DP)
We demonstrate the applicability and scalability of our approach for the state-of-the-art select-measure-generate algorithms MWEM+PGM and AIM.
- Score: 5.898893619901382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data is the lifeblood of the modern world, forming a fundamental part of AI, decision-making, and research advances. With increase in interest in data, governments have taken important steps towards a regulated data world, drastically impacting data sharing and data usability and resulting in massive amounts of data confined within the walls of organizations. While synthetic data generation (SDG) is an appealing solution to break down these walls and enable data sharing, the main drawback of existing solutions is the assumption of a trusted aggregator for generative model training. Given that many data holders may not want to, or be legally allowed to, entrust a central entity with their raw data, we propose a framework for the collaborative and private generation of synthetic tabular data from distributed data holders. Our solution is general, applicable to any marginal-based SDG, and provides input privacy by replacing the trusted aggregator with secure multi-party computation (MPC) protocols and output privacy via differential privacy (DP). We demonstrate the applicability and scalability of our approach for the state-of-the-art select-measure-generate SDG algorithms MWEM+PGM and AIM.
Related papers
- DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized Mixing [0.8739101659113155]
We introduce an effective data publishing algorithm emphDP-CDA.
Our proposed algorithm generates synthetic datasets by randomly mixing data in a class-specific manner, and inducing carefully-tuned randomness to ensure privacy guarantees.
Our results indicate that synthetic datasets produced using the DP-CDA can achieve superior utility compared to those generated by traditional data publishing algorithms, even when subject to the same privacy requirements.
arXiv Detail & Related papers (2024-11-25T06:14:06Z) - Generative AI for Secure and Privacy-Preserving Mobile Crowdsensing [74.58071278710896]
generative AI has attracted much attention from both academic and industrial fields.
Secure and privacy-preserving mobile crowdsensing (SPPMCS) has been widely applied in data collection/ acquirement.
arXiv Detail & Related papers (2024-05-17T04:00:58Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - FLAIM: AIM-based Synthetic Data Generation in the Federated Setting [18.38046354606749]
DistAIM and FLAIM are proposed to produce synthetic data that mirrors the statistical properties of private data.
We show that naively federating AIM can lead to substantial degradation in utility under the presence of heterogeneity.
arXiv Detail & Related papers (2023-10-05T10:34:47Z) - Libertas: Privacy-Preserving Computation for Decentralised Personal Data Stores [19.54818218429241]
We propose a modular design for integrating Secure Multi-Party Computation with Solid.
Our architecture, Libertas, requires no protocol level changes in the underlying design of Solid.
We show how this can be combined with existing differential privacy techniques to also ensure output privacy.
arXiv Detail & Related papers (2023-09-28T12:07:40Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - PS-FedGAN: An Efficient Federated Learning Framework Based on Partially
Shared Generative Adversarial Networks For Data Privacy [56.347786940414935]
Federated Learning (FL) has emerged as an effective learning paradigm for distributed computation.
This work proposes a novel FL framework that requires only partial GAN model sharing.
Named as PS-FedGAN, this new framework enhances the GAN releasing and training mechanism to address heterogeneous data distributions.
arXiv Detail & Related papers (2023-05-19T05:39:40Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Secure Multiparty Computation for Synthetic Data Generation from
Distributed Data [7.370727048591523]
Legal and ethical restrictions on accessing relevant data inhibit data science research in critical domains such as health, finance, and education.
Existing approaches assume that the data holders supply their raw data to a trusted curator, who uses it as fuel for synthetic data generation.
We propose the first solution in which data holders only share encrypted data for differentially private synthetic data generation.
arXiv Detail & Related papers (2022-10-13T20:09:17Z) - Mitigating Leakage from Data Dependent Communications in Decentralized
Computing using Differential Privacy [1.911678487931003]
We propose a general execution model to control the data-dependence of communications in user-side decentralized computations.
Our formal privacy guarantees leverage and extend recent results on privacy amplification by shuffling.
arXiv Detail & Related papers (2021-12-23T08:30:17Z) - Data Sharing Markets [95.13209326119153]
We study a setup where each agent can be both buyer and seller of data.
We consider two cases: bilateral data exchange (trading data with data) and unilateral data exchange (trading data with money)
arXiv Detail & Related papers (2021-07-19T06:00:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.