Federated Learning Empowered by Generative Content
- URL: http://arxiv.org/abs/2312.05807v1
- Date: Sun, 10 Dec 2023 07:38:56 GMT
- Title: Federated Learning Empowered by Generative Content
- Authors: Rui Ye, Xinyu Zhu, Jingyi Chai, Siheng Chen, Yanfeng Wang
- Abstract summary: Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
- Score: 55.576885852501775
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning (FL) enables leveraging distributed private data for model
training in a privacy-preserving way. However, data heterogeneity significantly
limits the performance of current FL methods. In this paper, we propose a novel
FL framework termed FedGC, designed to mitigate data heterogeneity issues by
diversifying private data with generative content. FedGC is a
simple-to-implement framework as it only introduces a one-shot step of data
generation. In data generation, we summarize three crucial and worth-exploring
aspects (budget allocation, prompt design, and generation guidance) and propose
three solution candidates for each aspect. Specifically, to achieve a better
trade-off between data diversity and fidelity for generation guidance, we
propose to generate data based on the guidance of prompts and real data
simultaneously. The generated data is then merged with private data to
facilitate local model training. Such generative data increases the diversity
of private data to prevent each client from fitting the potentially biased
private data, alleviating the issue of data heterogeneity. We conduct a
systematic empirical study on FedGC, covering diverse baselines, datasets,
scenarios, and modalities. Interesting findings include (1) FedGC consistently
and significantly enhances the performance of FL methods, even when notable
disparities exist between generative and private data; (2) FedGC achieves both
better performance and privacy-preservation. We wish this work can inspire
future works to further explore the potential of enhancing FL with generative
content.
Related papers
- FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - FLASH: Federated Learning Across Simultaneous Heterogeneities [54.80435317208111]
FLASH(Federated Learning Across Simultaneous Heterogeneities) is a lightweight and flexible client selection algorithm.
It outperforms state-of-the-art FL frameworks under extensive sources of Heterogeneities.
It achieves substantial and consistent improvements over state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-13T20:04:39Z) - Fake It Till Make It: Federated Learning with Consensus-Oriented
Generation [52.82176415223988]
We propose federated learning with consensus-oriented generation (FedCOG)
FedCOG consists of two key components at the client side: complementary data generation and knowledge-distillation-based model training.
Experiments on classical and real-world FL datasets show that FedCOG consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T18:49:59Z) - A Simple Data Augmentation for Feature Distribution Skewed Federated
Learning [12.636154758643757]
Federated learning (FL) facilitates collaborative learning among multiple clients in a distributed manner, while ensuring privacy protection.
In this paper, we focus on the feature distribution skewed FL scenario, which is widespread in real-world applications.
We propose FedRDN, a simple yet remarkably effective data augmentation method for feature distribution skewed FL.
arXiv Detail & Related papers (2023-06-14T05:46:52Z) - PS-FedGAN: An Efficient Federated Learning Framework Based on Partially
Shared Generative Adversarial Networks For Data Privacy [56.347786940414935]
Federated Learning (FL) has emerged as an effective learning paradigm for distributed computation.
This work proposes a novel FL framework that requires only partial GAN model sharing.
Named as PS-FedGAN, this new framework enhances the GAN releasing and training mechanism to address heterogeneous data distributions.
arXiv Detail & Related papers (2023-05-19T05:39:40Z) - Benchmarking FedAvg and FedCurv for Image Classification Tasks [1.376408511310322]
This paper focuses on the problem of statistical heterogeneity of the data in the same federated network.
Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv) have already been proposed.
As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
arXiv Detail & Related papers (2023-03-31T10:13:01Z) - Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z) - Federated Learning in Non-IID Settings Aided by Differentially Private
Synthetic Data [20.757477553095637]
Federated learning (FL) is a privacy-promoting framework that enables clients to collaboratively train machine learning models.
A major challenge in federated learning arises when the local data is heterogeneous.
We propose FedDPMS, an FL algorithm in which clients deploy variational auto-encoders to augment local datasets with data synthesized using differentially private means of latent data representations.
arXiv Detail & Related papers (2022-06-01T18:00:48Z) - Synthesizing Property & Casualty Ratemaking Datasets using Generative
Adversarial Networks [2.2649197740853677]
We show how to design three types of generative adversarial networks (GANs) that can build a synthetic insurance dataset from a confidential original dataset.
For transparency, the approaches are illustrated using a public dataset, the French motor third party liability data.
We find that the MC-WGAN-GP synthesizes the best data, the CTGAN is the easiest to use, and the MNCDP-GAN guarantees differential privacy.
arXiv Detail & Related papers (2020-08-13T21:02:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.