Distributed and Stochastic Optimization Methods with Gradient
Compression and Local Steps
- URL: http://arxiv.org/abs/2112.10645v1
- Date: Mon, 20 Dec 2021 16:12:54 GMT
- Title: Distributed and Stochastic Optimization Methods with Gradient
Compression and Local Steps
- Authors: Eduard Gorbunov
- Abstract summary: We propose theoretical frameworks for the analysis and distributed methods with error compensation and local updates.
We develop more than 20 new optimization methods, including the first linearly converging Error-pensated and first distributed Local-SGD methods.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this thesis, we propose new theoretical frameworks for the analysis of
stochastic and distributed methods with error compensation and local updates.
Using these frameworks, we develop more than 20 new optimization methods,
including the first linearly converging Error-Compensated SGD and the first
linearly converging Local-SGD for arbitrarily heterogeneous local functions.
Moreover, the thesis contains several new distributed methods with unbiased
compression for distributed non-convex optimization problems. The derived
complexity results for these methods outperform the previous best-known results
for the considered problems. Finally, we propose a new scalable decentralized
fault-tolerant distributed method, and under reasonable assumptions, we derive
the iteration complexity bounds for this method that match the ones of
centralized Local-SGD.
Related papers
- Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise [96.80184504268593]
gradient, clipping is one of the key algorithmic ingredients to derive good high-probability guarantees.
Clipping can spoil the convergence of the popular methods for composite and distributed optimization.
arXiv Detail & Related papers (2023-10-03T07:49:17Z) - Resource-Adaptive Newton's Method for Distributed Learning [16.588456212160928]
This paper introduces a novel and efficient algorithm called RANL, which overcomes the limitations of Newton's method.
Unlike traditional first-order methods, RANL exhibits remarkable independence from the condition number of the problem.
arXiv Detail & Related papers (2023-08-20T04:01:30Z) - Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient
Methods [73.35353358543507]
Gradient Descent-Ascent (SGDA) is one of the most prominent algorithms for solving min-max optimization and variational inequalities problems (VIP)
In this paper, we propose a unified convergence analysis that covers a large variety of descent-ascent methods.
We develop several new variants of SGDA such as a new variance-reduced method (L-SVRGDA), new distributed methods with compression (QSGDA, DIANA-SGDA, VR-DIANA-SGDA), and a new method with coordinate randomization (SEGA-SGDA)
arXiv Detail & Related papers (2022-02-15T09:17:39Z) - Distributionally Robust Federated Averaging [19.875176871167966]
We present communication efficient distributed algorithms for robust learning periodic averaging with adaptive sampling.
We give corroborating experimental evidence for our theoretical results in federated learning settings.
arXiv Detail & Related papers (2021-02-25T03:32:09Z) - IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method [64.15649345392822]
We introduce a framework for designing primal methods under the decentralized optimization setting where local functions are smooth and strongly convex.
Our approach consists of approximately solving a sequence of sub-problems induced by the accelerated augmented Lagrangian method.
When coupled with accelerated gradient descent, our framework yields a novel primal algorithm whose convergence rate is optimal and matched by recently derived lower bounds.
arXiv Detail & Related papers (2020-06-11T18:49:06Z) - FedSplit: An algorithmic framework for fast federated optimization [40.42352500741025]
We introduce FedSplit, a class of algorithms for solving distributed convex minimization with additive structure.
Our theory shows that these methods are provably robust to inexact computation of intermediate local quantities.
arXiv Detail & Related papers (2020-05-11T16:30:09Z) - Optimization in Machine Learning: A Distribution Space Approach [16.038814087205143]
We present the viewpoint that optimization problems in machine learning can often be interpreted as minimizing a convex functional over a function space.
We repose such problems via a suitable relaxation as convex optimization problems in the space distributions.
We develop a numerical algorithm based on mixture distributions to perform approximate optimization directly in distribution space.
arXiv Detail & Related papers (2020-04-18T13:38:06Z) - A Unified Theory of Decentralized SGD with Changing Topology and Local
Updates [70.9701218475002]
We introduce a unified convergence analysis of decentralized communication methods.
We derive universal convergence rates for several applications.
Our proofs rely on weak assumptions.
arXiv Detail & Related papers (2020-03-23T17:49:15Z) - Majorization Minimization Methods for Distributed Pose Graph
Optimization with Convergence Guarantees [0.76146285961466]
We show that our proposed methods are guaranteed to converge to first-order critical points under mild conditions.
Since our proposed methods rely a proximal operator of distributed PGO, the convergence rate can be significantly accelerated.
The efficacy of this work is validated through applications on a number of 2D and 3D SLAM datasets.
arXiv Detail & Related papers (2020-03-11T15:18:33Z) - Distributed Averaging Methods for Randomized Second Order Optimization [54.51566432934556]
We consider distributed optimization problems where forming the Hessian is computationally challenging and communication is a bottleneck.
We develop unbiased parameter averaging methods for randomized second order optimization that employ sampling and sketching of the Hessian.
We also extend the framework of second order averaging methods to introduce an unbiased distributed optimization framework for heterogeneous computing systems.
arXiv Detail & Related papers (2020-02-16T09:01:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.