Related papers: Algorithms for Collaborative Machine Learning under Statistical Heterogeneity

Algorithms for Collaborative Machine Learning under Statistical Heterogeneity

URL: http://arxiv.org/abs/2408.00050v1
Date: Wed, 31 Jul 2024 16:32:34 GMT
Title: Algorithms for Collaborative Machine Learning under Statistical Heterogeneity
Authors: Seok-Ju Hahn,
Abstract summary: Federated learning is currently the de facto standard of training a machine learning model across heterogeneous data owners. In this dissertation, three major factors can be considered as starting points -- textit parameter, textitmixing coefficient, and textitlocal data distributions.
Score: 1.8130068086063336
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning from distributed data without accessing them is undoubtedly a challenging and non-trivial task. Nevertheless, the necessity for distributed training of a statistical model has been increasing, due to the privacy concerns of local data owners and the cost in centralizing the massively distributed data. Federated learning (FL) is currently the de facto standard of training a machine learning model across heterogeneous data owners, without leaving the raw data out of local silos. Nevertheless, several challenges must be addressed in order for FL to be more practical in reality. Among these challenges, the statistical heterogeneity problem is the most significant and requires immediate attention. From the main objective of FL, three major factors can be considered as starting points -- \textit{parameter}, textit{mixing coefficient}, and \textit{local data distributions}. In alignment with the components, this dissertation is organized into three parts. In Chapter II, a novel personalization method, \texttt{SuPerFed}, inspired by the mode-connectivity is introduced. In Chapter III, an adaptive decision-making algorithm, \texttt{AAggFF}, is introduced for inducing uniform performance distributions in participating clients, which is realized by online convex optimization framework. Finally, in Chapter IV, a collaborative synthetic data generation method, \texttt{FedEvg}, is introduced, leveraging the flexibility and compositionality of an energy-based modeling approach. Taken together, all of these approaches provide practical solutions to mitigate the statistical heterogeneity problem in data-decentralized settings, paving the way for distributed systems and applications using collaborative machine learning methods.

Related papers

Harmonizing Generalization and Personalization in Ring-topology Decentralized Federated Learning [41.4210010333948]
We introduce Ring-topology Decentralized Federated Learning (RDFL) for distributed model training, aiming to avoid the inherent risks of centralized failure in server-based FL. RDFL faces the challenge of low information-sharing efficiency due to the point-to-point communication manner when handling inherent data heterogeneity. We propose a Divide-and-conquer RDFL framework (DRDFL) that uses a feature generation model to extract personalized information and invariant shared knowledge from the underlying data distribution.
arXiv Detail & Related papers (2025-04-27T04:38:49Z)
Paving the way for scientific foundation models: enhancing generalization and robustness in PDEs with constraint-aware pre-training [49.8035317670223]
A scientific foundation model (SciFM) is emerging as a promising tool for learning transferable representations across diverse domains. We propose incorporating PDE residuals into pre-training either as the sole learning signal or in combination with data loss to compensate for limited or infeasible training data. Our results show that pre-training with PDE constraints significantly enhances generalization, outperforming models trained solely on solution data.
arXiv Detail & Related papers (2025-03-24T19:12:39Z)
Client Contribution Normalization for Enhanced Federated Learning [4.726250115737579]
Mobile devices, including smartphones and laptops, generate decentralized and heterogeneous data. Federated Learning (FL) offers a promising alternative by enabling collaborative training of a global model across decentralized devices without data sharing. This paper focuses on data-dependent heterogeneity in FL and proposes a novel approach leveraging mean latent representations extracted from locally trained models.
arXiv Detail & Related papers (2024-11-10T04:03:09Z)
A review on different techniques used to combat the non-IID and heterogeneous nature of data in FL [0.0]
Federated Learning (FL) is a machine-learning approach enabling collaborative model training across multiple edge devices. The significance of FL is particularly pronounced in industries such as healthcare and finance, where data privacy holds paramount importance. This report delves into the issues arising from non-IID and heterogeneous data and explores current algorithms designed to address these challenges.
arXiv Detail & Related papers (2024-01-01T16:34:00Z)
FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning [1.4656078321003647]
Federated learning (FL) is a decentralized machine learning approach where independent learners process data privately. We study the currently popular data partitioning techniques and visualize their main disadvantages. We propose a method that leverages entropy and symmetry to construct 'the most challenging' and controllable data distributions.
arXiv Detail & Related papers (2023-10-11T18:39:08Z)
Analysis and Optimization of Wireless Federated Learning with Data Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation. We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE) Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z)
Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data. Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z)
Clustered Data Sharing for Non-IID Federated Learning over Wireless Networks [39.80420645943706]
Federated Learning (FL) is a distributed machine learning approach to leverage data from the Internet of Things (IoT) Current FL algorithms face the challenges of non-independent and identically distributed (non-IID) data, which causes high communication costs and model accuracy declines. We propose a clustered data sharing framework which spares the partial data from cluster heads to credible associates through device-to-device (D2D) communication.
arXiv Detail & Related papers (2023-02-17T07:11:02Z)
DRFLM: Distributionally Robust Federated Learning with Inter-client Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data. We propose a general framework to solve the above two challenges simultaneously. We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z)
FedH2L: Federated Learning with Model and Statistical Heterogeneity [75.61234545520611]
Federated learning (FL) enables distributed participants to collectively learn a strong global model without sacrificing their individual data privacy. We introduce FedH2L, which is agnostic to both the model architecture and robust to different data distributions across participants. In contrast to approaches sharing parameters or gradients, FedH2L relies on mutual distillation, exchanging only posteriors on a shared seed set between participants in a decentralized manner.
arXiv Detail & Related papers (2021-01-27T10:10:18Z)
Mitigating Bias in Federated Learning [9.295028968787351]
In this paper, we discuss causes of bias in federated learning (FL) We propose three pre-processing and in-processing methods to mitigate bias, without compromising data privacy. We conduct experiments over several data distributions to analyze their effects on model performance, fairness metrics, and bias learning patterns.
arXiv Detail & Related papers (2020-12-04T08:04:12Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.