On the Convergence of Decentralized Adaptive Gradient Methods
- URL: http://arxiv.org/abs/2109.03194v1
- Date: Tue, 7 Sep 2021 16:58:11 GMT
- Title: On the Convergence of Decentralized Adaptive Gradient Methods
- Authors: Xiangyi Chen, Belhal Karimi, Weijie Zhao, Ping Li
- Abstract summary: We introduce novel convergent decentralized adaptive gradient methods and rigorously incorporate adaptive gradient methods into decentralized training procedures.
Specifically, we propose a general algorithmic framework that can convert existing adaptive gradient methods to their decentralized counterparts.
We show that if a given adaptive gradient method converges, under some specific conditions, then its decentralized counterpart is also convergent.
- Score: 27.15543843721437
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adaptive gradient methods including Adam, AdaGrad, and their variants have
been very successful for training deep learning models, such as neural
networks. Meanwhile, given the need for distributed computing, distributed
optimization algorithms are rapidly becoming a focal point. With the growth of
computing power and the need for using machine learning models on mobile
devices, the communication cost of distributed training algorithms needs
careful consideration. In this paper, we introduce novel convergent
decentralized adaptive gradient methods and rigorously incorporate adaptive
gradient methods into decentralized training procedures. Specifically, we
propose a general algorithmic framework that can convert existing adaptive
gradient methods to their decentralized counterparts. In addition, we
thoroughly analyze the convergence behavior of the proposed algorithmic
framework and show that if a given adaptive gradient method converges, under
some specific conditions, then its decentralized counterpart is also
convergent. We illustrate the benefit of our generic decentralized framework on
a prototype method, i.e., AMSGrad, both theoretically and numerically.
Related papers
- Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data.
In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z) - On Accelerating Distributed Convex Optimizations [0.0]
This paper studies a distributed multi-agent convex optimization problem.
We show that the proposed algorithm converges linearly with an improved rate of convergence than the traditional and adaptive gradient-descent methods.
We demonstrate our algorithm's superior performance compared to prominent distributed algorithms for solving real logistic regression problems.
arXiv Detail & Related papers (2021-08-19T13:19:54Z) - Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent [7.6860514640178]
We propose a novel zeroth-order optimization algorithm for distributed reinforcement learning.
It allows each agent to estimate its local gradient by cost evaluation independently, without use of any consensus protocol.
arXiv Detail & Related papers (2021-07-26T18:11:07Z) - A general framework for decentralized optimization with first-order
methods [11.50057411285458]
Decentralized optimization to minimize a finite sum of functions over a network of nodes has been a significant focus in control and signal processing research.
The emergence of sophisticated computing and large-scale data science needs have led to a resurgence of activity in this area.
We discuss decentralized first-order gradient methods, which have found tremendous success in control, signal processing, and machine learning problems.
arXiv Detail & Related papers (2020-09-12T17:52:10Z) - Adaptive Serverless Learning [114.36410688552579]
We propose a novel adaptive decentralized training approach, which can compute the learning rate from data dynamically.
Our theoretical results reveal that the proposed algorithm can achieve linear speedup with respect to the number of workers.
To reduce the communication-efficient overhead, we further propose a communication-efficient adaptive decentralized training approach.
arXiv Detail & Related papers (2020-08-24T13:23:02Z) - IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method [64.15649345392822]
We introduce a framework for designing primal methods under the decentralized optimization setting where local functions are smooth and strongly convex.
Our approach consists of approximately solving a sequence of sub-problems induced by the accelerated augmented Lagrangian method.
When coupled with accelerated gradient descent, our framework yields a novel primal algorithm whose convergence rate is optimal and matched by recently derived lower bounds.
arXiv Detail & Related papers (2020-06-11T18:49:06Z) - Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search.
We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.