On the Role of Server Momentum in Federated Learning
- URL: http://arxiv.org/abs/2312.12670v1
- Date: Tue, 19 Dec 2023 23:56:49 GMT
- Title: On the Role of Server Momentum in Federated Learning
- Authors: Jianhui Sun, Xidong Wu, Heng Huang, Aidong Zhang
- Abstract summary: We propose a general framework for server momentum, that (a) covers a large class of momentum schemes that are unexplored in federated learning (FL)
We provide rigorous convergence analysis for the proposed framework.
- Score: 85.54616432098706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated Averaging (FedAvg) is known to experience convergence issues when
encountering significant clients system heterogeneity and data heterogeneity.
Server momentum has been proposed as an effective mitigation. However, existing
server momentum works are restrictive in the momentum formulation, do not
properly schedule hyperparameters and focus only on system homogeneous
settings, which leaves the role of server momentum still an under-explored
problem. In this paper, we propose a general framework for server momentum,
that (a) covers a large class of momentum schemes that are unexplored in
federated learning (FL), (b) enables a popular stagewise hyperparameter
scheduler, (c) allows heterogeneous and asynchronous local computing. We
provide rigorous convergence analysis for the proposed framework. To our best
knowledge, this is the first work that thoroughly analyzes the performances of
server momentum with a hyperparameter scheduler and system heterogeneity.
Extensive experiments validate the effectiveness of our proposed framework.
Related papers
- Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata.
DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase.
Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z) - Efficient Federated Learning against Heterogeneous and Non-stationary Client Unavailability [23.466997173249034]
FedAPM includes novel structures that (i) for missed computations due to unavailability with only $(1)O$ additional memory computation with respect to standard FedAvg.
We show that FedAPM converges to a stationary point even non-stationary algorithm despite being non-stationary dynamics.
arXiv Detail & Related papers (2024-09-26T00:38:18Z) - LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks.
By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections.
Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z) - Scalable Federated Unlearning via Isolated and Coded Sharding [76.12847512410767]
Federated unlearning has emerged as a promising paradigm to erase the client-level data effect.
This paper proposes a scalable federated unlearning framework based on isolated sharding and coded computing.
arXiv Detail & Related papers (2024-01-29T08:41:45Z) - Client Orchestration and Cost-Efficient Joint Optimization for
NOMA-Enabled Hierarchical Federated Learning [55.49099125128281]
We propose a non-orthogonal multiple access (NOMA) enabled HFL system under semi-synchronous cloud model aggregation.
We show that the proposed scheme outperforms the considered benchmarks regarding HFL performance improvement and total cost reduction.
arXiv Detail & Related papers (2023-11-03T13:34:44Z) - Asynchronous Federated Learning with Incentive Mechanism Based on
Contract Theory [5.502596101979607]
We propose a novel asynchronous FL framework that integrates an incentive mechanism based on contract theory.
Our framework exhibits a 1.35% accuracy improvement over the ideal Local SGD under attacks.
arXiv Detail & Related papers (2023-10-10T09:17:17Z) - Momentum Benefits Non-IID Federated Learning Simply and Provably [22.800862422479913]
Federated learning is a powerful paradigm for large-scale machine learning.
FedAvg and SCAFFOLD are two prominent algorithms to address these challenges.
This paper explores the utilization of momentum to enhance the performance of FedAvg and SCAFFOLD.
arXiv Detail & Related papers (2023-06-28T18:52:27Z) - Asynchronous Hierarchical Federated Learning [10.332084068006345]
Asynchronous hierarchical federated learning is proposed to solve problems of heavy server traffic, long periods of convergence, and unreliable accuracy.
A special aggregator device is selected to enable hierarchical learning, so that the burden of the server can be significantly reduced.
We evaluate the proposed algorithm on CIFAR-10 image classification task.
arXiv Detail & Related papers (2022-05-31T18:42:29Z) - Federated Stochastic Gradient Descent Begets Self-Induced Momentum [151.4322255230084]
Federated learning (FL) is an emerging machine learning method that can be applied in mobile edge systems.
We show that running to the gradient descent (SGD) in such a setting can be viewed as adding a momentum-like term to the global aggregation process.
arXiv Detail & Related papers (2022-02-17T02:01:37Z) - Byzantine-robust Federated Learning through Spatial-temporal Analysis of
Local Model Updates [6.758334200305236]
Federated Learning (FL) enables multiple distributed clients (e.g., mobile devices) to collaboratively train a centralized model while keeping the training data locally on the client.
In this paper, we propose to mitigate these failures and attacks from a spatial-temporal perspective.
Specifically, we use a clustering-based method to detect and exclude incorrect updates by leveraging their geometric properties in the parameter space.
arXiv Detail & Related papers (2021-07-03T18:48:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.