Privacy-preserving Quantification of Non-IID Degree in Federated Learning
- URL: http://arxiv.org/abs/2406.09682v1
- Date: Fri, 14 Jun 2024 03:08:53 GMT
- Title: Privacy-preserving Quantification of Non-IID Degree in Federated Learning
- Authors: Yuping Yan, Yizhi Wang, Yingchao Yu, Yaochu Jin,
- Abstract summary: Federated learning (FL) offers a privacy-preserving approach to machine learning for multiple collaborators without sharing raw data.
The existence of non-independent and non-identically distributed (non-IID) datasets across different clients presents a significant challenge to FL.
This paper proposes a quantitative definition of the non-IID degree in the federated environment by employing the cumulative distribution function.
- Score: 22.194684042923406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning (FL) offers a privacy-preserving approach to machine learning for multiple collaborators without sharing raw data. However, the existence of non-independent and non-identically distributed (non-IID) datasets across different clients presents a significant challenge to FL, leading to a sharp drop in accuracy, reduced efficiency, and hindered implementation. To address the non-IID problem, various methods have been proposed, including clustering and personalized FL frameworks. Nevertheless, to date, a formal quantitative definition of the non-IID degree between different clients' datasets is still missing, hindering the clients from comparing and obtaining an overview of their data distributions with other clients. For the first time, this paper proposes a quantitative definition of the non-IID degree in the federated environment by employing the cumulative distribution function (CDF), called Fully Homomorphic Encryption-based Federated Cumulative Distribution Function (FHE-FCDF). This method utilizes cryptographic primitive fully homomorphic encryption to enable clients to estimate the non-IID degree while ensuring privacy preservation. The experiments conducted on the CIFAR-100 non-IID dataset validate the effectiveness of our proposed method.
Related papers
- Non-IID data in Federated Learning: A Systematic Review with Taxonomy, Metrics, Methods, Frameworks and Future Directions [2.9434966603161072]
This systematic review aims to fill a gap by providing a detailed taxonomy for non-IID data, partition protocols, and metrics.
We describe popular solutions to address non-IID data and standardized frameworks employed in Federated Learning with heterogeneous data.
arXiv Detail & Related papers (2024-11-19T09:53:28Z) - Approximate Gradient Coding for Privacy-Flexible Federated Learning with Non-IID Data [9.984630251008868]
This work focuses on the challenges of non-IID data and stragglers/dropouts in federated learning.
We introduce and explore a privacy-flexible paradigm that models parts of the clients' local data as non-private.
arXiv Detail & Related papers (2024-04-04T15:29:50Z) - Federated Causal Discovery from Heterogeneous Data [70.31070224690399]
We propose a novel FCD method attempting to accommodate arbitrary causal models and heterogeneous data.
These approaches involve constructing summary statistics as a proxy of the raw data to protect data privacy.
We conduct extensive experiments on synthetic and real datasets to show the efficacy of our method.
arXiv Detail & Related papers (2024-02-20T18:53:53Z) - DCFL: Non-IID awareness Data Condensation aided Federated Learning [0.8158530638728501]
Federated learning is a decentralized learning paradigm wherein a central server trains a global model iteratively by utilizing clients who possess a certain amount of private datasets.
The challenge lies in the fact that the client side private data may not be identically and independently distributed.
We propose DCFL which divides clients into groups by using the Centered Kernel Alignment (CKA) method, then uses dataset condensation methods with non-IID awareness to complete clients.
arXiv Detail & Related papers (2023-12-21T13:04:24Z) - Privacy-preserving Federated Primal-dual Learning for Non-convex and Non-smooth Problems with Model Sparsification [51.04894019092156]
Federated learning (FL) has been recognized as a rapidly growing area, where the model is trained over clients under the FL orchestration (PS)
In this paper, we propose a novel primal sparification algorithm for and guarantee non-smooth FL problems.
Its unique insightful properties and its analyses are also presented.
arXiv Detail & Related papers (2023-10-30T14:15:47Z) - Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients.
FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification.
Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z) - CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with
Clustered Aggregation and Knowledge DIStilled Regularization [3.3711670942444014]
Federated learning enables edge devices to train a global model collaboratively without exposing their data.
We tackle a new type of Non-IID data, called cluster-skewed non-IID, discovered in actual data sets.
We propose an aggregation scheme that guarantees equality between clusters.
arXiv Detail & Related papers (2023-02-21T02:53:37Z) - Differentially Private Federated Clustering over Non-IID Data [59.611244450530315]
clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server.
We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered.
Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
arXiv Detail & Related papers (2023-01-03T05:38:43Z) - FEDIC: Federated Learning on Non-IID and Long-Tailed Data via Calibrated
Distillation [54.2658887073461]
Dealing with non-IID data is one of the most challenging problems for federated learning.
This paper studies the joint problem of non-IID and long-tailed data in federated learning and proposes a corresponding solution called Federated Ensemble Distillation with Imbalance (FEDIC)
FEDIC uses model ensemble to take advantage of the diversity of models trained on non-IID data.
arXiv Detail & Related papers (2022-04-30T06:17:36Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.