Federated Learning on Non-IID Data Silos: An Experimental Study
- URL: http://arxiv.org/abs/2102.02079v2
- Date: Thu, 4 Feb 2021 06:45:28 GMT
- Title: Federated Learning on Non-IID Data Silos: An Experimental Study
- Authors: Qinbin Li, Yiqun Diao, Quan Chen, Bingsheng He
- Abstract summary: Training data have been increasingly fragmented, forming distributed databases of multiple data silos.
In this paper, we propose comprehensive data partitioning strategies to cover the typical non-IID data cases.
We find that non-IID does bring significant challenges in learning accuracy of FL algorithms, and none of the existing state-of-the-art FL algorithms outperforms others in all cases.
- Score: 34.28108345251376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning services have been emerging in many data-intensive
applications, and their effectiveness highly relies on large-volume
high-quality training data. However, due to the increasing privacy concerns and
data regulations, training data have been increasingly fragmented, forming
distributed databases of multiple data silos (e.g., within different
organizations and countries). To develop effective machine learning services,
there is a must to exploit data from such distributed databases without
exchanging the raw data. Recently, federated learning (FL) has been a solution
with growing interests, which enables multiple parties to collaboratively train
a machine learning model without exchanging their local data. A key and common
challenge on distributed databases is the heterogeneity of the data
distribution (i.e., non-IID) among the parties. There have been many FL
algorithms to address the learning effectiveness under non-IID data settings.
However, there lacks an experimental study on systematically understanding
their advantages and disadvantages, as previous studies have very rigid data
partitioning strategies among parties, which are hardly representative and
thorough. In this paper, to help researchers better understand and study the
non-IID data setting in federated learning, we propose comprehensive data
partitioning strategies to cover the typical non-IID data cases. Moreover, we
conduct extensive experiments to evaluate state-of-the-art FL algorithms. We
find that non-IID does bring significant challenges in learning accuracy of FL
algorithms, and none of the existing state-of-the-art FL algorithms outperforms
others in all cases. Our experiments provide insights for future studies of
addressing the challenges in data silos.
Related papers
- Non-IID data in Federated Learning: A Systematic Review with Taxonomy, Metrics, Methods, Frameworks and Future Directions [2.9434966603161072]
This systematic review aims to fill a gap by providing a detailed taxonomy for non-IID data, partition protocols, and metrics.
We describe popular solutions to address non-IID data and standardized frameworks employed in Federated Learning with heterogeneous data.
arXiv Detail & Related papers (2024-11-19T09:53:28Z) - A review on different techniques used to combat the non-IID and
heterogeneous nature of data in FL [0.0]
Federated Learning (FL) is a machine-learning approach enabling collaborative model training across multiple edge devices.
The significance of FL is particularly pronounced in industries such as healthcare and finance, where data privacy holds paramount importance.
This report delves into the issues arising from non-IID and heterogeneous data and explores current algorithms designed to address these challenges.
arXiv Detail & Related papers (2024-01-01T16:34:00Z) - FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms
for Federated Learning [1.4656078321003647]
Federated learning (FL) is a decentralized machine learning approach where independent learners process data privately.
We study the currently popular data partitioning techniques and visualize their main disadvantages.
We propose a method that leverages entropy and symmetry to construct 'the most challenging' and controllable data distributions.
arXiv Detail & Related papers (2023-10-11T18:39:08Z) - Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z) - Federated XGBoost on Sample-Wise Non-IID Data [8.49189353769386]
Decision tree-based models, in particular XGBoost, can handle non-IID data.
This paper investigates the effects of how Federated XGBoost is impacted by non-IID distributions.
arXiv Detail & Related papers (2022-09-03T06:14:20Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Towards Federated Long-Tailed Learning [76.50892783088702]
Data privacy and class imbalance are the norm rather than the exception in many machine learning tasks.
Recent attempts have been launched to, on one side, address the problem of learning from pervasive private data, and on the other side, learn from long-tailed data.
This paper focuses on learning with long-tailed (LT) data distributions under the context of the popular privacy-preserved federated learning (FL) framework.
arXiv Detail & Related papers (2022-06-30T02:34:22Z) - FEDIC: Federated Learning on Non-IID and Long-Tailed Data via Calibrated
Distillation [54.2658887073461]
Dealing with non-IID data is one of the most challenging problems for federated learning.
This paper studies the joint problem of non-IID and long-tailed data in federated learning and proposes a corresponding solution called Federated Ensemble Distillation with Imbalance (FEDIC)
FEDIC uses model ensemble to take advantage of the diversity of models trained on non-IID data.
arXiv Detail & Related papers (2022-04-30T06:17:36Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - Federated Learning on Non-IID Data: A Survey [11.431837357827396]
Federated learning is an emerging distributed machine learning framework for privacy preservation.
Models trained in federated learning usually have worse performance than those trained in the standard centralized learning mode.
arXiv Detail & Related papers (2021-06-12T19:45:35Z) - Quasi-Global Momentum: Accelerating Decentralized Deep Learning on
Heterogeneous Data [77.88594632644347]
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks.
In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge.
We propose a novel momentum-based method to mitigate this decentralized training difficulty.
arXiv Detail & Related papers (2021-02-09T11:27:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.