Asynchronous Distributed Optimization with Delay-free Parameters
- URL: http://arxiv.org/abs/2312.06508v1
- Date: Mon, 11 Dec 2023 16:33:38 GMT
- Title: Asynchronous Distributed Optimization with Delay-free Parameters
- Authors: Xuyang Wu, Changxin Liu, Sindri Magnusson, and Mikael Johansson
- Abstract summary: This paper develops asynchronous versions of two distributed algorithms, Prox-DGD and DGD-ATC, for solving consensus optimization problems over undirected networks.
In contrast to alternatives, our algorithms can converge to the fixed point set of their synchronous counterparts using step-sizes that are independent of the delays.
We also show that the convergence speed of the two asynchronous methods adapts to the actual level of asynchrony rather than being constrained by the worst-case.
- Score: 9.062164411594175
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing asynchronous distributed optimization algorithms often use
diminishing step-sizes that cause slow practical convergence, or use fixed
step-sizes that depend on and decrease with an upper bound of the delays. Not
only are such delay bounds hard to obtain in advance, but they also tend to be
large and rarely attained, resulting in unnecessarily slow convergence. This
paper develops asynchronous versions of two distributed algorithms, Prox-DGD
and DGD-ATC, for solving consensus optimization problems over undirected
networks. In contrast to alternatives, our algorithms can converge to the fixed
point set of their synchronous counterparts using step-sizes that are
independent of the delays. We establish convergence guarantees for strongly and
weakly convex problems under both partial and total asynchrony. We also show
that the convergence speed of the two asynchronous methods adapts to the actual
level of asynchrony rather than being constrained by the worst-case. Numerical
experiments demonstrate a strong practical performance of our asynchronous
algorithms.
Related papers
- Freya PAGE: First Optimal Time Complexity for Large-Scale Nonconvex Finite-Sum Optimization with Heterogeneous Asynchronous Computations [92.1840862558718]
In practical distributed systems, workers typically not homogeneous, and can have highly varying processing times.
We introduce a new parallel method Freya to handle arbitrarily slow computations.
We show that Freya offers significantly improved complexity guarantees compared to all previous methods.
arXiv Detail & Related papers (2024-05-24T13:33:30Z) - DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server.
To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z) - Towards Understanding the Generalizability of Delayed Stochastic
Gradient Descent [63.43247232708004]
A gradient descent performed in an asynchronous manner plays a crucial role in training large-scale machine learning models.
Existing generalization error bounds are rather pessimistic and cannot reveal the correlation between asynchronous delays and generalization.
Our theoretical results indicate that asynchronous delays reduce the generalization error of the delayed SGD algorithm.
arXiv Detail & Related papers (2023-08-18T10:00:27Z) - Robust Fully-Asynchronous Methods for Distributed Training over General Architecture [11.480605289411807]
Perfect synchronization in distributed machine learning problems is inefficient and even impossible due to the existence of latency, package losses and stragglers.
We propose Fully-Asynchronous Gradient Tracking method (R-FAST), where each device performs local computation and communication at its own without any form of impact.
arXiv Detail & Related papers (2023-07-21T14:36:40Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays [8.46491234455848]
We prove much better guarantees for the same asynchronous gradient regardless of the delays in steps, depending instead just on the number of steps.
For our analysis, we introduce a novel based on "virtual steps" and delay iterations, which allow us to derive state-of-the-art guarantees for both convex non-adaptive gradients.
arXiv Detail & Related papers (2022-06-15T16:28:37Z) - Delay-adaptive step-sizes for asynchronous learning [8.272788656521415]
We show that it is possible to use learning rates that depend on the actual time-varying delays in the system.
For each of these methods, we demonstrate how delays can be measured on-line, present delay-adaptive step-size policies, and illustrate their theoretical and practical advantages over the state-of-the-art.
arXiv Detail & Related papers (2022-02-17T09:51:22Z) - Asynchronous Iterations in Optimization: New Sequence Results and
Sharper Algorithmic Guarantees [10.984101749941471]
We introduce novel convergence results for asynchronous iterations that appear in the analysis of parallel and distributed optimization algorithms.
Results are simple to apply and give explicit estimates for how the degree of asynchrony impacts the convergence rates of the iterates.
arXiv Detail & Related papers (2021-09-09T19:08:56Z) - Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD)
We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z) - Decentralized Optimization with Heterogeneous Delays: a Continuous-Time
Approach [6.187780920448871]
We propose a novel continuous-time framework to analyze asynchronous algorithms.
We describe a fully asynchronous decentralized algorithm to minimize the sum of smooth and strongly convex functions.
arXiv Detail & Related papers (2021-06-07T13:09:25Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.