Related papers: Federated XGBoost on Sample-Wise Non-IID Data

Federated XGBoost on Sample-Wise Non-IID Data

URL: http://arxiv.org/abs/2209.01340v1
Date: Sat, 3 Sep 2022 06:14:20 GMT
Title: Federated XGBoost on Sample-Wise Non-IID Data
Authors: Katelinh Jones, Yuya Jeremy Ong, Yi Zhou, Nathalie Baracaldo
Abstract summary: Decision tree-based models, in particular XGBoost, can handle non-IID data. This paper investigates the effects of how Federated XGBoost is impacted by non-IID distributions.
Score: 8.49189353769386
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Federated Learning (FL) is a paradigm for jointly training machine learning algorithms in a decentralized manner which allows for parties to communicate with an aggregator to create and train a model, without exposing the underlying raw data distribution of the local parties involved in the training process. Most research in FL has been focused on Neural Network-based approaches, however Tree-Based methods, such as XGBoost, have been underexplored in Federated Learning due to the challenges in overcoming the iterative and additive characteristics of the algorithm. Decision tree-based models, in particular XGBoost, can handle non-IID data, which is significant for algorithms used in Federated Learning frameworks since the underlying characteristics of the data are decentralized and have risks of being non-IID by nature. In this paper, we focus on investigating the effects of how Federated XGBoost is impacted by non-IID distributions by performing experiments on various sample size-based data skew scenarios and how these models perform under various non-IID scenarios. We conduct a set of extensive experiments across multiple different datasets and different data skew partitions. Our experimental results demonstrate that despite the various partition ratios, the performance of the models stayed consistent and performed close to or equally well against models that were trained in a centralized manner.

Related papers

A review on different techniques used to combat the non-IID and heterogeneous nature of data in FL [0.0]
Federated Learning (FL) is a machine-learning approach enabling collaborative model training across multiple edge devices. The significance of FL is particularly pronounced in industries such as healthcare and finance, where data privacy holds paramount importance. This report delves into the issues arising from non-IID and heterogeneous data and explores current algorithms designed to address these challenges.
arXiv Detail & Related papers (2024-01-01T16:34:00Z)
Efficient Cluster Selection for Personalized Federated Learning: A Multi-Armed Bandit Approach [2.5477011559292175]
Federated learning (FL) offers a decentralized training approach for machine learning models, prioritizing data privacy. In this paper, we introduce a dynamic Upper Confidence Bound (dUCB) algorithm inspired by the multi-armed bandit (MAB) approach.
arXiv Detail & Related papers (2023-10-29T16:46:50Z)
FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning [1.4656078321003647]
Federated learning (FL) is a decentralized machine learning approach where independent learners process data privately. We study the currently popular data partitioning techniques and visualize their main disadvantages. We propose a method that leverages entropy and symmetry to construct 'the most challenging' and controllable data distributions.
arXiv Detail & Related papers (2023-10-11T18:39:08Z)
Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z)
Consistency Regularization for Generalizable Source-free Domain Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset. Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets. We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z)
Data-SUITE: Data-centric identification of in-distribution incongruous examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data. We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z)
Deep Stable Learning for Out-Of-Distribution Generalization [27.437046504902938]
Approaches based on deep neural networks have achieved striking performance when testing data and training data share similar distribution. Eliminating the impact of distribution shifts between training and testing data is crucial for building performance-promising deep models. We propose to address this problem by removing the dependencies between features via learning weights for training samples.
arXiv Detail & Related papers (2021-04-16T03:54:21Z)
Cross-Gradient Aggregation for Decentralized Learning from Non-IID data [34.23789472226752]
Decentralized learning enables a group of collaborative agents to learn models using a distributed dataset without the need for a central parameter server. We propose Cross-Gradient Aggregation (CGA), a novel decentralized learning algorithm. We show superior learning performance of CGA over existing state-of-the-art decentralized learning algorithms.
arXiv Detail & Related papers (2021-03-02T21:58:12Z)
Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data [77.88594632644347]
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks. In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge. We propose a novel momentum-based method to mitigate this decentralized training difficulty.
arXiv Detail & Related papers (2021-02-09T11:27:14Z)
Federated Learning under Importance Sampling [49.17137296715029]
We study the effect of importance sampling and devise schemes for sampling agents and data non-uniformly guided by a performance measure. We find that in schemes involving sampling without replacement, the performance of the resulting architecture is controlled by two factors related to data variability at each agent.
arXiv Detail & Related papers (2020-12-14T10:08:55Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.