Differentially Private Federated Clustering over Non-IID Data
- URL: http://arxiv.org/abs/2301.00955v3
- Date: Mon, 30 Oct 2023 14:59:23 GMT
- Title: Differentially Private Federated Clustering over Non-IID Data
- Authors: Yiwei Li, Shuai Wang, Chong-Yung Chi, Tony Q. S. Quek
- Abstract summary: clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server.
We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered.
Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
- Score: 59.611244450530315
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we investigate federated clustering (FedC) problem, that aims
to accurately partition unlabeled data samples distributed over massive clients
into finite clusters under the orchestration of a parameter server, meanwhile
considering data privacy. Though it is an NP-hard optimization problem
involving real variables denoting cluster centroids and binary variables
denoting the cluster membership of each data sample, we judiciously reformulate
the FedC problem into a non-convex optimization problem with only one convex
constraint, accordingly yielding a soft clustering solution. Then a novel FedC
algorithm using differential privacy (DP) technique, referred to as DP-FedC, is
proposed in which partial clients participation and multiple local model
updating steps are also considered. Furthermore, various attributes of the
proposed DP-FedC are obtained through theoretical analyses of privacy
protection and convergence rate, especially for the case of non-identically and
independently distributed (non-i.i.d.) data, that ideally serve as the
guidelines for the design of the proposed DP-FedC. Then some experimental
results on two real datasets are provided to demonstrate the efficacy of the
proposed DP-FedC together with its much superior performance over some
state-of-the-art FedC algorithms, and the consistency with all the presented
analytical results.
Related papers
- Privacy-preserving Quantification of Non-IID Degree in Federated Learning [22.194684042923406]
Federated learning (FL) offers a privacy-preserving approach to machine learning for multiple collaborators without sharing raw data.
The existence of non-independent and non-identically distributed (non-IID) datasets across different clients presents a significant challenge to FL.
This paper proposes a quantitative definition of the non-IID degree in the federated environment by employing the cumulative distribution function.
arXiv Detail & Related papers (2024-06-14T03:08:53Z) - Rethinking Clustered Federated Learning in NOMA Enhanced Wireless
Networks [60.09912912343705]
This study explores the benefits of integrating the novel clustered federated learning (CFL) approach with non-independent and identically distributed (non-IID) datasets.
A detailed theoretical analysis of the generalization gap that measures the degree of non-IID in the data distribution is presented.
Solutions to address the challenges posed by non-IID conditions are proposed with the analysis of the properties.
arXiv Detail & Related papers (2024-03-05T17:49:09Z) - Privacy-preserving Federated Primal-dual Learning for Non-convex and Non-smooth Problems with Model Sparsification [51.04894019092156]
Federated learning (FL) has been recognized as a rapidly growing area, where the model is trained over clients under the FL orchestration (PS)
In this paper, we propose a novel primal sparification algorithm for and guarantee non-smooth FL problems.
Its unique insightful properties and its analyses are also presented.
arXiv Detail & Related papers (2023-10-30T14:15:47Z) - Federated Two Stage Decoupling With Adaptive Personalization Layers [5.69361786082969]
Federated learning has gained significant attention due to its ability to enable distributed learning while maintaining privacy constraints.
It inherently experiences significant learning degradation and slow convergence speed.
It is natural to employ the concept of clustering homogeneous clients into the same group, allowing only the model weights within each group to be aggregated.
arXiv Detail & Related papers (2023-08-30T07:46:32Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Personalized Graph Federated Learning with Differential Privacy [6.282767337715445]
This paper presents a personalized graph federated learning (PGFL) framework in which distributedly connected servers and their respective edge devices collaboratively learn device or cluster-specific models.
We study a variant of the PGFL implementation that utilizes differential privacy, specifically zero-concentrated differential privacy, where a noise sequence perturbs model exchanges.
Our analysis shows that the algorithm ensures local differential privacy for all clients in terms of zero-concentrated differential privacy.
arXiv Detail & Related papers (2023-06-10T09:52:01Z) - CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with
Clustered Aggregation and Knowledge DIStilled Regularization [3.3711670942444014]
Federated learning enables edge devices to train a global model collaboratively without exposing their data.
We tackle a new type of Non-IID data, called cluster-skewed non-IID, discovered in actual data sets.
We propose an aggregation scheme that guarantees equality between clusters.
arXiv Detail & Related papers (2023-02-21T02:53:37Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Sample-based and Feature-based Federated Learning via Mini-batch SSCA [18.11773963976481]
This paper investigates sample-based and feature-based federated optimization.
We show that the proposed algorithms can preserve data privacy through the model aggregation mechanism.
We also show that the proposed algorithms converge to Karush-Kuhn-Tucker points of the respective federated optimization problems.
arXiv Detail & Related papers (2021-04-13T08:23:46Z) - A Unified Linear Speedup Analysis of Federated Averaging and Nesterov
FedAvg [49.76940694847521]
Federated learning (FL) learns a model jointly from a set of participating devices without sharing each other's privately held data.
In this paper, we focus on Federated Averaging (FedAvg), one of the most popular and effective FL algorithms in use today.
We show that FedAvg enjoys linear speedup in each case, although with different convergence rates and communication efficiencies.
arXiv Detail & Related papers (2020-07-11T05:59:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.