Related papers: Federated Learning on Non-IID Data Silos: An Experimental Study

Federated Learning on Non-IID Data Silos: An Experimental Study

URL: http://arxiv.org/abs/2102.02079v2
Date: Thu, 4 Feb 2021 06:45:28 GMT
Title: Federated Learning on Non-IID Data Silos: An Experimental Study
Authors: Qinbin Li, Yiqun Diao, Quan Chen, Bingsheng He
Abstract summary: Training data have been increasingly fragmented, forming distributed databases of multiple data silos. In this paper, we propose comprehensive data partitioning strategies to cover the typical non-IID data cases. We find that non-IID does bring significant challenges in learning accuracy of FL algorithms, and none of the existing state-of-the-art FL algorithms outperforms others in all cases.
Score: 34.28108345251376
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning services have been emerging in many data-intensive applications, and their effectiveness highly relies on large-volume high-quality training data. However, due to the increasing privacy concerns and data regulations, training data have been increasingly fragmented, forming distributed databases of multiple data silos (e.g., within different organizations and countries). To develop effective machine learning services, there is a must to exploit data from such distributed databases without exchanging the raw data. Recently, federated learning (FL) has been a solution with growing interests, which enables multiple parties to collaboratively train a machine learning model without exchanging their local data. A key and common challenge on distributed databases is the heterogeneity of the data distribution (i.e., non-IID) among the parties. There have been many FL algorithms to address the learning effectiveness under non-IID data settings. However, there lacks an experimental study on systematically understanding their advantages and disadvantages, as previous studies have very rigid data partitioning strategies among parties, which are hardly representative and thorough. In this paper, to help researchers better understand and study the non-IID data setting in federated learning, we propose comprehensive data partitioning strategies to cover the typical non-IID data cases. Moreover, we conduct extensive experiments to evaluate state-of-the-art FL algorithms. We find that non-IID does bring significant challenges in learning accuracy of FL algorithms, and none of the existing state-of-the-art FL algorithms outperforms others in all cases. Our experiments provide insights for future studies of addressing the challenges in data silos.

Related papers

Understanding Federated Learning from IID to Non-IID dataset: An Experimental Study [5.680416078423551]
federated learning (FL) has emerged as a promising approach for training machine learning models across decentralized data sources without sharing raw data. A significant challenge in FL is that client data are often non-IID (non-independent and identically distributed), leading to reduced performance compared to centralized learning.
arXiv Detail & Related papers (2025-01-31T21:58:15Z)
Non-IID data in Federated Learning: A Systematic Review with Taxonomy, Metrics, Methods, Frameworks and Future Directions [2.9434966603161072]
This systematic review aims to fill a gap by providing a detailed taxonomy for non-IID data, partition protocols, and metrics. We describe popular solutions to address non-IID data and standardized frameworks employed in Federated Learning with heterogeneous data.
arXiv Detail & Related papers (2024-11-19T09:53:28Z)
A review on different techniques used to combat the non-IID and heterogeneous nature of data in FL [0.0]
Federated Learning (FL) is a machine-learning approach enabling collaborative model training across multiple edge devices. The significance of FL is particularly pronounced in industries such as healthcare and finance, where data privacy holds paramount importance. This report delves into the issues arising from non-IID and heterogeneous data and explores current algorithms designed to address these challenges.
arXiv Detail & Related papers (2024-01-01T16:34:00Z)
FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning [1.4656078321003647]
Federated learning (FL) is a decentralized machine learning approach where independent learners process data privately. We study the currently popular data partitioning techniques and visualize their main disadvantages. We propose a method that leverages entropy and symmetry to construct 'the most challenging' and controllable data distributions.
arXiv Detail & Related papers (2023-10-11T18:39:08Z)
Rethinking Data Heterogeneity in Federated Learning: Introducing a New Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants. Our observations are intuitive. Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z)
Federated XGBoost on Sample-Wise Non-IID Data [8.49189353769386]
Decision tree-based models, in particular XGBoost, can handle non-IID data. This paper investigates the effects of how Federated XGBoost is impacted by non-IID distributions.
arXiv Detail & Related papers (2022-09-03T06:14:20Z)
A Survey of Learning on Small Data: Generalization, Optimization, and Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI. This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data. Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z)
Towards Federated Long-Tailed Learning [76.50892783088702]
Data privacy and class imbalance are the norm rather than the exception in many machine learning tasks. Recent attempts have been launched to, on one side, address the problem of learning from pervasive private data, and on the other side, learn from long-tailed data. This paper focuses on learning with long-tailed (LT) data distributions under the context of the popular privacy-preserved federated learning (FL) framework.
arXiv Detail & Related papers (2022-06-30T02:34:22Z)
FEDIC: Federated Learning on Non-IID and Long-Tailed Data via Calibrated Distillation [54.2658887073461]
Dealing with non-IID data is one of the most challenging problems for federated learning. This paper studies the joint problem of non-IID and long-tailed data in federated learning and proposes a corresponding solution called Federated Ensemble Distillation with Imbalance (FEDIC) FEDIC uses model ensemble to take advantage of the diversity of models trained on non-IID data.
arXiv Detail & Related papers (2022-04-30T06:17:36Z)
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z)
Federated Learning on Non-IID Data: A Survey [11.431837357827396]
Federated learning is an emerging distributed machine learning framework for privacy preservation. Models trained in federated learning usually have worse performance than those trained in the standard centralized learning mode.
arXiv Detail & Related papers (2021-06-12T19:45:35Z)
Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data [77.88594632644347]
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks. In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge. We propose a novel momentum-based method to mitigate this decentralized training difficulty.
arXiv Detail & Related papers (2021-02-09T11:27:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.