A Convergence Theory for Federated Average: Beyond Smoothness
- URL: http://arxiv.org/abs/2211.01588v1
- Date: Thu, 3 Nov 2022 04:50:49 GMT
- Title: A Convergence Theory for Federated Average: Beyond Smoothness
- Authors: Xiaoxiao Li, Zhao Song, Runzhou Tao, Guangyi Zhang
- Abstract summary: Federated learning enables a large amount of edge computing devices to learn a model without data sharing jointly.
As a leading algorithm in this setting, Federated Average FedAvg, which runs Gradient Descent (SGD) in parallel on local devices, has been widely used.
This paper provides a theoretical convergence study on Federated Learning.
- Score: 28.074273047592065
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Federated learning enables a large amount of edge computing devices to learn
a model without data sharing jointly. As a leading algorithm in this setting,
Federated Average FedAvg, which runs Stochastic Gradient Descent (SGD) in
parallel on local devices and averages the sequences only once in a while, have
been widely used due to their simplicity and low communication cost. However,
despite recent research efforts, it lacks theoretical analysis under
assumptions beyond smoothness. In this paper, we analyze the convergence of
FedAvg. Different from the existing work, we relax the assumption of strong
smoothness. More specifically, we assume the semi-smoothness and semi-Lipschitz
properties for the loss function, which have an additional first-order term in
assumption definitions. In addition, we also assume bound on the gradient,
which is weaker than the commonly used bounded gradient assumption in the
convergence analysis scheme. As a solution, this paper provides a theoretical
convergence study on Federated Learning.
Related papers
- A Unified Analysis for Finite Weight Averaging [50.75116992029417]
Averaging iterations of Gradient Descent (SGD) have achieved empirical success in training deep learning models, such as Weight Averaging (SWA), Exponential Moving Average (EMA), and LAtest Weight Averaging (LAWA)
In this paper, we generalize LAWA as Finite Weight Averaging (FWA) and explain their advantages compared to SGD from the perspective of optimization and generalization.
arXiv Detail & Related papers (2024-11-20T10:08:22Z) - Taming Nonconvex Stochastic Mirror Descent with General Bregman
Divergence [25.717501580080846]
This paper revisits the convergence of gradient Forward Mirror (SMD) in the contemporary non optimization setting.
For the training, we develop provably convergent algorithms for the problem of linear networks.
arXiv Detail & Related papers (2024-02-27T17:56:49Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
We show that Adam converges to $epsilon$-stationary points with $O(epsilon-4)$ gradient complexity under far more realistic conditions.
We also propose a variance-reduced version of Adam with an accelerated gradient complexity of $O(epsilon-3)$.
arXiv Detail & Related papers (2023-04-27T06:27:37Z) - A simplified convergence theory for Byzantine resilient stochastic
gradient descent [0.0]
In distributed learning, a central server trains a model according to updates provided by nodes holding local data samples.
In the presence of one or more malicious servers, standard algorithms for training model such as gradient descent (SGD) fail to converge.
arXiv Detail & Related papers (2022-08-25T05:37:14Z) - On the Unreasonable Effectiveness of Federated Averaging with
Heterogeneous Data [39.600069116159695]
Existing theory predicts that data heterogeneity will degrade the performance of the Federated Averaging (FedAvg) algorithm in federated learning.
This paper explains the seemingly unreasonable effectiveness of FedAvg that contradicts the previous theoretical predictions.
arXiv Detail & Related papers (2022-06-09T18:25:25Z) - Sharp Bounds for Federated Averaging (Local SGD) and Continuous
Perspective [49.17352150219212]
Federated AveragingFedAvg, also known as Local SGD, is one of the most popular algorithms in Federated Learning (FL)
We show how to analyze this quantity from the Differential Equation (SDE) perspective.
arXiv Detail & Related papers (2021-11-05T22:16:11Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - On Convergence of Training Loss Without Reaching Stationary Points [62.41370821014218]
We show that Neural Network weight variables do not converge to stationary points where the gradient the loss function vanishes.
We propose a new perspective based on ergodic theory dynamical systems.
arXiv Detail & Related papers (2021-10-12T18:12:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.