FLAIM: AIM-based Synthetic Data Generation in the Federated Setting
- URL: http://arxiv.org/abs/2310.03447v3
- Date: Sun, 28 Jul 2024 16:44:40 GMT
- Title: FLAIM: AIM-based Synthetic Data Generation in the Federated Setting
- Authors: Samuel Maddock, Graham Cormode, Carsten Maple,
- Abstract summary: DistAIM and FLAIM are proposed to produce synthetic data that mirrors the statistical properties of private data.
We show that naively federating AIM can lead to substantial degradation in utility under the presence of heterogeneity.
- Score: 18.38046354606749
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Preserving individual privacy while enabling collaborative data sharing is crucial for organizations. Synthetic data generation is one solution, producing artificial data that mirrors the statistical properties of private data. While numerous techniques have been devised under differential privacy, they predominantly assume data is centralized. However, data is often distributed across multiple clients in a federated manner. In this work, we initiate the study of federated synthetic tabular data generation. Building upon a SOTA central method known as AIM, we present DistAIM and FLAIM. We first show that it is straightforward to distribute AIM, extending a recent approach based on secure multi-party computation which necessitates additional overhead, making it less suited to federated scenarios. We then demonstrate that naively federating AIM can lead to substantial degradation in utility under the presence of heterogeneity. To mitigate both issues, we propose an augmented FLAIM approach that maintains a private proxy of heterogeneity. We simulate our methods across a range of benchmark datasets under different degrees of heterogeneity and show we can improve utility while reducing overhead.
Related papers
- EPIC: Enhancing Privacy through Iterative Collaboration [4.199844472131922]
Traditional machine learning techniques require centralized data collection and processing.
Privacy, ownership, and stringent regulation issues exist when pooling medical data into centralized storage.
The Federated learning (FL) approach overcomes such issues by setting up a central aggregator server and a shared global model.
arXiv Detail & Related papers (2024-11-07T20:10:34Z) - FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - CaPS: Collaborative and Private Synthetic Data Generation from Distributed Sources [5.898893619901382]
We propose a framework for the collaborative and private generation of synthetic data from distributed data holders.
We replace the trusted aggregator with secure multi-party computation protocols and output privacy via differential privacy (DP)
We demonstrate the applicability and scalability of our approach for the state-of-the-art select-measure-generate algorithms MWEM+PGM and AIM.
arXiv Detail & Related papers (2024-02-13T17:26:32Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - PS-FedGAN: An Efficient Federated Learning Framework Based on Partially
Shared Generative Adversarial Networks For Data Privacy [56.347786940414935]
Federated Learning (FL) has emerged as an effective learning paradigm for distributed computation.
This work proposes a novel FL framework that requires only partial GAN model sharing.
Named as PS-FedGAN, this new framework enhances the GAN releasing and training mechanism to address heterogeneous data distributions.
arXiv Detail & Related papers (2023-05-19T05:39:40Z) - Benchmarking FedAvg and FedCurv for Image Classification Tasks [1.376408511310322]
This paper focuses on the problem of statistical heterogeneity of the data in the same federated network.
Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv) have already been proposed.
As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
arXiv Detail & Related papers (2023-03-31T10:13:01Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z) - Mitigating Data Heterogeneity in Federated Learning with Data
Augmentation [26.226057709504733]
Federated Learning (FL) is a framework that enables training a centralized model while securing user privacy by fusing local, decentralized models.
One major obstacle is data heterogeneity, i.e., each client having non-identically and independently distributed (non-IID) data.
Recent evidence suggests that data augmentation can induce equal or greater performance.
arXiv Detail & Related papers (2022-06-20T19:47:43Z) - FedMix: Approximation of Mixup under Mean Augmented Federated Learning [60.503258658382]
Federated learning (FL) allows edge devices to collectively learn a model without directly sharing data within each device.
Current state-of-the-art algorithms suffer from performance degradation as the heterogeneity of local data across clients increases.
We propose a new augmentation algorithm, named FedMix, which is inspired by a phenomenal yet simple data augmentation method, Mixup.
arXiv Detail & Related papers (2021-07-01T06:14:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.