On the Role of Server Momentum in Federated Learning
- URL: http://arxiv.org/abs/2312.12670v1
- Date: Tue, 19 Dec 2023 23:56:49 GMT
- Title: On the Role of Server Momentum in Federated Learning
- Authors: Jianhui Sun, Xidong Wu, Heng Huang, Aidong Zhang
- Abstract summary: We propose a general framework for server momentum, that (a) covers a large class of momentum schemes that are unexplored in federated learning (FL)
We provide rigorous convergence analysis for the proposed framework.
- Score: 85.54616432098706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated Averaging (FedAvg) is known to experience convergence issues when
encountering significant clients system heterogeneity and data heterogeneity.
Server momentum has been proposed as an effective mitigation. However, existing
server momentum works are restrictive in the momentum formulation, do not
properly schedule hyperparameters and focus only on system homogeneous
settings, which leaves the role of server momentum still an under-explored
problem. In this paper, we propose a general framework for server momentum,
that (a) covers a large class of momentum schemes that are unexplored in
federated learning (FL), (b) enables a popular stagewise hyperparameter
scheduler, (c) allows heterogeneous and asynchronous local computing. We
provide rigorous convergence analysis for the proposed framework. To our best
knowledge, this is the first work that thoroughly analyzes the performances of
server momentum with a hyperparameter scheduler and system heterogeneity.
Extensive experiments validate the effectiveness of our proposed framework.
Related papers
- LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks.
By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections.
Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z) - Scalable Federated Unlearning via Isolated and Coded Sharding [76.12847512410767]
Federated unlearning has emerged as a promising paradigm to erase the client-level data effect.
This paper proposes a scalable federated unlearning framework based on isolated sharding and coded computing.
arXiv Detail & Related papers (2024-01-29T08:41:45Z) - Client Orchestration and Cost-Efficient Joint Optimization for
NOMA-Enabled Hierarchical Federated Learning [55.49099125128281]
We propose a non-orthogonal multiple access (NOMA) enabled HFL system under semi-synchronous cloud model aggregation.
We show that the proposed scheme outperforms the considered benchmarks regarding HFL performance improvement and total cost reduction.
arXiv Detail & Related papers (2023-11-03T13:34:44Z) - Asynchronous Federated Learning with Incentive Mechanism Based on
Contract Theory [5.502596101979607]
We propose a novel asynchronous FL framework that integrates an incentive mechanism based on contract theory.
Our framework exhibits a 1.35% accuracy improvement over the ideal Local SGD under attacks.
arXiv Detail & Related papers (2023-10-10T09:17:17Z) - Momentum Benefits Non-IID Federated Learning Simply and Provably [22.800862422479913]
Federated learning is a powerful paradigm for large-scale machine learning.
FedAvg and SCAFFOLD are two prominent algorithms to address these challenges.
This paper explores the utilization of momentum to enhance the performance of FedAvg and SCAFFOLD.
arXiv Detail & Related papers (2023-06-28T18:52:27Z) - FedSkip: Combatting Statistical Heterogeneity with Federated Skip
Aggregation [95.85026305874824]
We introduce a data-driven approach called FedSkip to improve the client optima by periodically skipping federated averaging and scattering local models to the cross devices.
We conduct extensive experiments on a range of datasets to demonstrate that FedSkip achieves much higher accuracy, better aggregation efficiency and competing communication efficiency.
arXiv Detail & Related papers (2022-12-14T13:57:01Z) - Asynchronous Hierarchical Federated Learning [10.332084068006345]
Asynchronous hierarchical federated learning is proposed to solve problems of heavy server traffic, long periods of convergence, and unreliable accuracy.
A special aggregator device is selected to enable hierarchical learning, so that the burden of the server can be significantly reduced.
We evaluate the proposed algorithm on CIFAR-10 image classification task.
arXiv Detail & Related papers (2022-05-31T18:42:29Z) - Federated Stochastic Gradient Descent Begets Self-Induced Momentum [151.4322255230084]
Federated learning (FL) is an emerging machine learning method that can be applied in mobile edge systems.
We show that running to the gradient descent (SGD) in such a setting can be viewed as adding a momentum-like term to the global aggregation process.
arXiv Detail & Related papers (2022-02-17T02:01:37Z) - Hybrid intelligence for dynamic job-shop scheduling with deep
reinforcement learning and attention mechanism [28.28095225164155]
We formulate the DJSP as a Markov decision process (MDP) to be tackled by reinforcement learning (RL)
We propose a flexible hybrid framework that takes disjunctive graphs as states and a set of general dispatching rules as the action space with minimum prior domain knowledge.
We present Gymjsp, a public benchmark based on the well-known OR-Library, to provide a standardized off-the-shelf facility for RL and DJSP research communities.
arXiv Detail & Related papers (2022-01-03T09:38:13Z) - Byzantine-robust Federated Learning through Spatial-temporal Analysis of
Local Model Updates [6.758334200305236]
Federated Learning (FL) enables multiple distributed clients (e.g., mobile devices) to collaboratively train a centralized model while keeping the training data locally on the client.
In this paper, we propose to mitigate these failures and attacks from a spatial-temporal perspective.
Specifically, we use a clustering-based method to detect and exclude incorrect updates by leveraging their geometric properties in the parameter space.
arXiv Detail & Related papers (2021-07-03T18:48:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.