A Thorough Assessment of the Non-IID Data Impact in Federated Learning
- URL: http://arxiv.org/abs/2503.17070v1
- Date: Fri, 21 Mar 2025 11:53:36 GMT
- Title: A Thorough Assessment of the Non-IID Data Impact in Federated Learning
- Authors: Daniel M. Jimenez-Gutierrez, Mehrdad Hassanzadeh, Aris Anagnostopoulos, Ioannis Chatzigiannakis, Andrea Vitaletti,
- Abstract summary: Decentralized machine learning (FL) allows collaborative machine learning (ML) training among decentralized clients' information, ensuring data privacy.<n>The decentralized nature of FL deals with non-independent and identically distributed (non-IID) data.<n>Despite its importance, experimental studies systematically addressing all types of data heterogeneity (ak.a. non-IIDness) remain scarce.<n>We aim to fill this gap by assessing and quantifying the non-IID effect through a thorough empirical analysis.
- Score: 1.89791888476598
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated learning (FL) allows collaborative machine learning (ML) model training among decentralized clients' information, ensuring data privacy. The decentralized nature of FL deals with non-independent and identically distributed (non-IID) data. This open problem has notable consequences, such as decreased model performance and more significant convergence times. Despite its importance, experimental studies systematically addressing all types of data heterogeneity (a.k.a. non-IIDness) remain scarce. We aim to fill this gap by assessing and quantifying the non-IID effect through a thorough empirical analysis. We use the Hellinger Distance (HD) to measure differences in distribution among clients. Our study benchmarks four state-of-the-art strategies for handling non-IID data, including label, feature, quantity, and spatiotemporal skewness, under realistic and controlled conditions. This is the first comprehensive analysis of the spatiotemporal skew effect in FL. Our findings highlight the significant impact of label and spatiotemporal skew non-IID types on FL model performance, with notable performance drops occurring at specific HD thresholds. Additionally, the FL performance is heavily affected mainly when the non-IIDness is extreme. Thus, we provide recommendations for FL research to tackle data heterogeneity effectively. Our work represents the most extensive examination of non-IIDness in FL, offering a robust foundation for future research.
Related papers
- Understanding Federated Learning from IID to Non-IID dataset: An Experimental Study [5.680416078423551]
federated learning (FL) has emerged as a promising approach for training machine learning models across decentralized data sources without sharing raw data.<n>A significant challenge in FL is that client data are often non-IID (non-independent and identically distributed), leading to reduced performance compared to centralized learning.
arXiv Detail & Related papers (2025-01-31T21:58:15Z) - FedAD-Bench: A Unified Benchmark for Federated Unsupervised Anomaly Detection in Tabular Data [11.42231457116486]
FedAD-Bench is a benchmark for evaluating unsupervised anomaly detection algorithms within the context of federated learning.
We identify key challenges such as model aggregation inefficiencies and metric unreliability.
Our work aims to establish a standardized benchmark to guide future research and development in federated anomaly detection.
arXiv Detail & Related papers (2024-08-08T13:14:19Z) - Synthetic Data Aided Federated Learning Using Foundation Models [4.666380225768727]
We propose Differentially Private Synthetic Data Aided Federated Learning Using Foundation Models (DPSDA-FL)
Our experimental results have shown that DPSDA-FL can improve class recall and classification accuracy of the global model by up to 26% and 9%, respectively, in FL with Non-IID issues.
arXiv Detail & Related papers (2024-07-06T20:31:43Z) - Privacy-preserving Federated Primal-dual Learning for Non-convex and Non-smooth Problems with Model Sparsification [51.04894019092156]
Federated learning (FL) has been recognized as a rapidly growing area, where the model is trained over clients under the FL orchestration (PS)
In this paper, we propose a novel primal sparification algorithm for and guarantee non-smooth FL problems.
Its unique insightful properties and its analyses are also presented.
arXiv Detail & Related papers (2023-10-30T14:15:47Z) - Do Deep Neural Networks Always Perform Better When Eating More Data? [82.6459747000664]
We design experiments from Identically Independent Distribution(IID) and Out of Distribution(OOD)
Under IID condition, the amount of information determines the effectivity of each sample, the contribution of samples and difference between classes determine the amount of class information.
Under OOD condition, the cross-domain degree of samples determine the contributions, and the bias-fitting caused by irrelevant elements is a significant factor of cross-domain.
arXiv Detail & Related papers (2022-05-30T15:40:33Z) - FEDIC: Federated Learning on Non-IID and Long-Tailed Data via Calibrated
Distillation [54.2658887073461]
Dealing with non-IID data is one of the most challenging problems for federated learning.
This paper studies the joint problem of non-IID and long-tailed data in federated learning and proposes a corresponding solution called Federated Ensemble Distillation with Imbalance (FEDIC)
FEDIC uses model ensemble to take advantage of the diversity of models trained on non-IID data.
arXiv Detail & Related papers (2022-04-30T06:17:36Z) - Towards Understanding Quality Challenges of the Federated Learning: A
First Look from the Lens of Robustness [4.822471415125479]
Federated learning (FL) aims to preserve users' data privacy while leveraging the entire dataset of all participants for training.
FL still tends to suffer from quality issues such as attacks or byzantine faults.
This paper investigates the effectiveness of state-of-the-art (SOTA) robust FL techniques in the presence of attacks and faults.
arXiv Detail & Related papers (2022-01-05T02:06:39Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - FedEval: A Benchmark System with a Comprehensive Evaluation Model for
Federated Learning [17.680627081257246]
In this paper, we propose a comprehensive evaluation framework for federated learning (FL) systems.
We first introduce the ACTPR model, which defines five metrics that cannot be excluded in FL evaluation, including Accuracy, Communication, Time efficiency, Privacy, and Robustness.
We then provide an in-depth benchmarking study between the two most widely-used FL mechanisms, FedSGD and FedAvg.
arXiv Detail & Related papers (2020-11-19T04:59:51Z) - A Principled Approach to Data Valuation for Federated Learning [73.19984041333599]
Federated learning (FL) is a popular technique to train machine learning (ML) models on decentralized data sources.
The Shapley value (SV) defines a unique payoff scheme that satisfies many desiderata for a data value notion.
This paper proposes a variant of the SV amenable to FL, which we call the federated Shapley value.
arXiv Detail & Related papers (2020-09-14T04:37:54Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.