Federated clustering with GAN-based data synthesis
- URL: http://arxiv.org/abs/2210.16524v2
- Date: Mon, 23 Oct 2023 11:01:10 GMT
- Title: Federated clustering with GAN-based data synthesis
- Authors: Jie Yan, Jing Liu, Ji Qi and Zhong-Yuan Zhang
- Abstract summary: Federated clustering (FC) is an extension of centralized clustering in federated settings.
We propose a new federated clustering framework, named synthetic data aided federated clustering (SDA-FC)
It trains generative adversarial network locally in each client and uploads the generated synthetic data to the server, where KM or FCM is performed on the synthetic data.
The synthetic data can make the model immune to the non-IID problem and enable us to capture the global similarity characteristics more effectively without sharing private data.
- Score: 12.256298398007848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated clustering (FC) is an extension of centralized clustering in
federated settings. The key here is how to construct a global similarity
measure without sharing private data, since the local similarity may be
insufficient to group local data correctly and the similarity of samples across
clients cannot be directly measured due to privacy constraints. Obviously, the
most straightforward way to analyze FC is to employ the methods extended from
centralized ones, such as K-means (KM) and fuzzy c-means (FCM). However, they
are vulnerable to non independent-and-identically-distributed (non-IID) data
among clients. To handle this, we propose a new federated clustering framework,
named synthetic data aided federated clustering (SDA-FC). It trains generative
adversarial network locally in each client and uploads the generated synthetic
data to the server, where KM or FCM is performed on the synthetic data. The
synthetic data can make the model immune to the non-IID problem and enable us
to capture the global similarity characteristics more effectively without
sharing private data. Comprehensive experiments reveals the advantages of
SDA-FC, including superior performance in addressing the non-IID problem and
the device failures.
Related papers
- PS-FedGAN: An Efficient Federated Learning Framework Based on Partially
Shared Generative Adversarial Networks For Data Privacy [56.347786940414935]
Federated Learning (FL) has emerged as an effective learning paradigm for distributed computation.
This work proposes a novel FL framework that requires only partial GAN model sharing.
Named as PS-FedGAN, this new framework enhances the GAN releasing and training mechanism to address heterogeneous data distributions.
arXiv Detail & Related papers (2023-05-19T05:39:40Z) - Benchmarking FedAvg and FedCurv for Image Classification Tasks [1.376408511310322]
This paper focuses on the problem of statistical heterogeneity of the data in the same federated network.
Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv) have already been proposed.
As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
arXiv Detail & Related papers (2023-03-31T10:13:01Z) - CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with
Clustered Aggregation and Knowledge DIStilled Regularization [3.3711670942444014]
Federated learning enables edge devices to train a global model collaboratively without exposing their data.
We tackle a new type of Non-IID data, called cluster-skewed non-IID, discovered in actual data sets.
We propose an aggregation scheme that guarantees equality between clusters.
arXiv Detail & Related papers (2023-02-21T02:53:37Z) - Differentially Private Federated Clustering over Non-IID Data [59.611244450530315]
clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server.
We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered.
Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
arXiv Detail & Related papers (2023-01-03T05:38:43Z) - Privacy-Preserving Federated Deep Clustering based on GAN [12.256298398007848]
We present a novel approach to Federated Deep Clustering based on Generative Adversarial Networks (GANs)
Each client trains a local generative adversarial network (GAN) locally and uploads the synthetic data to the server.
The server applies a deep clustering network on the synthetic data to establish $k$ cluster centroids, which are then downloaded to the clients for cluster assignment.
arXiv Detail & Related papers (2022-11-30T13:20:11Z) - Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z) - Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated
Learning via Class-Imbalance Reduction [76.26710990597498]
We show that the class-imbalance of the grouped data from randomly selected clients can lead to significant performance degradation.
Based on our key observation, we design an efficient client sampling mechanism, i.e., Federated Class-balanced Sampling (Fed-CBS)
In particular, we propose a measure of class-imbalance and then employ homomorphic encryption to derive this measure in a privacy-preserving way.
arXiv Detail & Related papers (2022-09-30T05:42:56Z) - Efficient Distribution Similarity Identification in Clustered Federated
Learning via Principal Angles Between Client Data Subspaces [59.33965805898736]
Clustered learning has been shown to produce promising results by grouping clients into clusters.
Existing FL algorithms are essentially trying to group clients together with similar distributions.
Prior FL algorithms attempt similarities indirectly during training.
arXiv Detail & Related papers (2022-09-21T17:37:54Z) - Federated Learning with GAN-based Data Synthesis for Non-IID Clients [8.304185807036783]
Federated learning (FL) has recently emerged as a popular privacy-preserving collaborative learning paradigm.
We propose a novel framework, named Synthetic Data Aided Federated Learning (SDA-FL), to resolve this non-IID challenge by sharing synthetic data.
arXiv Detail & Related papers (2022-06-11T11:43:25Z) - Secure Federated Clustering [18.37669220755388]
SecFC is a secure federated clustering algorithm that simultaneously achieves universal performance.
Each client's private data and the cluster centers are not leaked to other clients and the server.
arXiv Detail & Related papers (2022-05-31T06:47:18Z) - Heterogeneous Federated Learning via Grouped Sequential-to-Parallel
Training [60.892342868936865]
Federated learning (FL) is a rapidly growing privacy-preserving collaborative machine learning paradigm.
We propose a data heterogeneous-robust FL approach, FedGSP, to address this challenge.
We show that FedGSP improves the accuracy by 3.7% on average compared with seven state-of-the-art approaches.
arXiv Detail & Related papers (2022-01-31T03:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.