Federated Learning with Nesterov Accelerated Gradient Momentum Method
- URL: http://arxiv.org/abs/2009.08716v1
- Date: Fri, 18 Sep 2020 09:38:11 GMT
- Title: Federated Learning with Nesterov Accelerated Gradient Momentum Method
- Authors: Zhengjie Yang, Wei Bao, Dong Yuan, Nguyen H. Tran, and Albert Y.
Zomaya
- Abstract summary: Federated learning (FL) is a fast-developing technique that allows multiple workers to train a global model based on a distributed dataset.
It is well known that Nesterov Accelerated Gradient (NAG) is more advantageous in centralized training environment.
In this work, we focus on a version of FL based on NAG and provide a detailed convergence analysis.
- Score: 47.49442006239034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning (FL) is a fast-developing technique that allows multiple
workers to train a global model based on a distributed dataset. Conventional FL
employs gradient descent algorithm, which may not be efficient enough. It is
well known that Nesterov Accelerated Gradient (NAG) is more advantageous in
centralized training environment, but it is not clear how to quantify the
benefits of NAG in FL so far. In this work, we focus on a version of FL based
on NAG (FedNAG) and provide a detailed convergence analysis. The result is
compared with conventional FL based on gradient descent. One interesting
conclusion is that as long as the learning step size is sufficiently small,
FedNAG outperforms FedAvg. Extensive experiments based on real-world datasets
are conducted, verifying our conclusions and confirming the better convergence
performance of FedNAG.
Related papers
- FedNAR: Federated Optimization with Normalized Annealing Regularization [54.42032094044368]
We explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms.
We develop Federated optimization with Normalized Annealing Regularization (FedNAR), a plug-in that can be seamlessly integrated into any existing FL algorithms.
arXiv Detail & Related papers (2023-10-04T21:11:40Z) - Understanding How Consistency Works in Federated Learning via Stage-wise
Relaxed Initialization [84.42306265220274]
Federated learning (FL) is a distributed paradigm that coordinates massive local clients to collaboratively train a global model.
Previous works have implicitly studied that FL suffers from the client-drift'' problem, which is caused by the inconsistent optimum across local clients.
To alleviate the negative impact of the client drift'' and explore its substance in FL, we first design an efficient FL algorithm textitFedInit.
arXiv Detail & Related papers (2023-06-09T06:55:15Z) - FedDA: Faster Framework of Local Adaptive Gradient Methods via Restarted
Dual Averaging [104.41634756395545]
Federated learning (FL) is an emerging learning paradigm to tackle massively distributed data.
We propose textbfFedDA, a novel framework for local adaptive gradient methods.
We show that textbfFedDA-MVR is the first adaptive FL algorithm that achieves this rate.
arXiv Detail & Related papers (2023-02-13T05:10:30Z) - Why Batch Normalization Damage Federated Learning on Non-IID Data? [34.06900591666005]
Federated learning (FL) involves training deep neural network (DNN) models at the network edge while protecting the privacy of the edge clients.
Batch normalization (BN) has been regarded as a simple and effective means to accelerate the training and improve the capability generalization.
Recent findings indicate that BN can significantly impair the performance of FL in the presence of non-i.i.d. data.
We present the first convergence analysis to show that under the non-i.i.d. data, the mismatch between the local and global statistical parameters in BN causes the gradient deviation between the local and global models
arXiv Detail & Related papers (2023-01-08T05:24:12Z) - ISFL: Federated Learning for Non-i.i.d. Data with Local Importance Sampling [17.29669920752378]
We propose importance sampling federated learning (ISFL), an explicit framework with theoretical guarantees.
We derive the convergence theorem of ISFL to involve the effects of local importance sampling.
We employ a water-filling method to calculate the IS weights and develop the ISFL algorithms.
arXiv Detail & Related papers (2022-10-05T09:43:58Z) - Fine-tuning Global Model via Data-Free Knowledge Distillation for
Non-IID Federated Learning [86.59588262014456]
Federated Learning (FL) is an emerging distributed learning paradigm under privacy constraint.
We propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG)
Our FedFTG significantly outperforms the state-of-the-art (SOTA) FL algorithms and can serve as a strong plugin for enhancing FedAvg, FedProx, FedDyn, and SCAFFOLD.
arXiv Detail & Related papers (2022-03-17T11:18:17Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - Accelerating Federated Learning with a Global Biased Optimiser [16.69005478209394]
Federated Learning (FL) is a recent development in the field of machine learning that collaboratively trains models without the training data leaving client devices.
We propose a novel, generalised approach for applying adaptive optimisation techniques to FL with the Federated Global Biased Optimiser (FedGBO) algorithm.
FedGBO accelerates FL by applying a set of global biased optimiser values during the local training phase of FL, which helps to reduce client-drift' from non-IID data.
arXiv Detail & Related papers (2021-08-20T12:08:44Z) - FL-NTK: A Neural Tangent Kernel-based Framework for Federated Learning
Convergence Analysis [27.022551495550676]
This paper presents a new class of convergence analysis for FL, Learning Neural Kernel (FL-NTK), which corresponds to overterized Reparamterized ReLU neural networks trained by gradient descent in FL.
Theoretically, FL-NTK converges to a global-optimal solution at atrivial rate with properly tuned linear learning parameters.
arXiv Detail & Related papers (2021-05-11T13:05:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.