Asynchronous Stochastic Optimization Robust to Arbitrary Delays
- URL: http://arxiv.org/abs/2106.11879v1
- Date: Tue, 22 Jun 2021 15:50:45 GMT
- Title: Asynchronous Stochastic Optimization Robust to Arbitrary Delays
- Authors: Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain
- Abstract summary: We consider optimization with delayed gradients where, at each time stept$, the algorithm makes an update using a stale computation - d_t$ for arbitrary delay $d_t gradient.
Our experiments demonstrate the efficacy and robustness of our algorithm in cases where the delay distribution is skewed or heavy-tailed.
- Score: 54.61797739710608
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider stochastic optimization with delayed gradients where, at each
time step $t$, the algorithm makes an update using a stale stochastic gradient
from step $t - d_t$ for some arbitrary delay $d_t$. This setting abstracts
asynchronous distributed optimization where a central server receives gradient
updates computed by worker machines. These machines can experience computation
and communication loads that might vary significantly over time. In the general
non-convex smooth optimization setting, we give a simple and efficient
algorithm that requires $O( \sigma^2/\epsilon^4 + \tau/\epsilon^2 )$ steps for
finding an $\epsilon$-stationary point $x$, where $\tau$ is the \emph{average}
delay $\smash{\frac{1}{T}\sum_{t=1}^T d_t}$ and $\sigma^2$ is the variance of
the stochastic gradients. This improves over previous work, which showed that
stochastic gradient decent achieves the same rate but with respect to the
\emph{maximal} delay $\max_{t} d_t$, that can be significantly larger than the
average delay especially in heterogeneous distributed systems. Our experiments
demonstrate the efficacy and robustness of our algorithm in cases where the
delay distribution is skewed or heavy-tailed.
Related papers
- Faster Stochastic Optimization with Arbitrary Delays via Asynchronous Mini-Batching [25.95786289683455]
We show a method that automatically adapts to all quantiles simultaneously, without any prior knowledge of the delays.
We further show a method that automatically adapts to all quantiles simultaneously, without any prior knowledge of the delays, achieving convergence rates of the form $O(inf_q tau_q2/(q T)2+sigma/sqrtqT)$ for non-size and $O(tau_q2/(q T)2+sigma/sqrtqT)$ for convex problems
arXiv Detail & Related papers (2024-08-14T12:30:51Z) - Semi-Discrete Optimal Transport: Nearly Minimax Estimation With Stochastic Gradient Descent and Adaptive Entropic Regularization [38.67914746910537]
We prove an $mathcalO(t-1)$ lower bound rate for the OT map, using the similarity between Laguerre cells estimation and density support estimation.
To nearly achieve the desired fast rate, we design an entropic regularization scheme decreasing with the number of samples.
arXiv Detail & Related papers (2024-05-23T11:46:03Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Non-stationary Online Convex Optimization with Arbitrary Delays [50.46856739179311]
This paper investigates the delayed online convex optimization (OCO) in non-stationary environments.
We first propose a simple algorithm, namely DOGD, which performs a gradient descent step for each delayed gradient according to their arrival order.
We develop an improved algorithm, which reduces those dynamic regret bounds achieved by DOGD to $O(sqrtbardT(P_T+1))$.
arXiv Detail & Related papers (2023-05-20T07:54:07Z) - Improved Rate of First Order Algorithms for Entropic Optimal Transport [2.1485350418225244]
This paper improves the state-of-the-art rate of a first-order algorithm for solving entropy regularized optimal transport.
We propose an accelerated primal-dual mirror descent algorithm with variance reduction.
Our algorithm may inspire more research to develop accelerated primal-dual algorithms that have rate $widetildeO(n2/epsilon)$ for solving OT.
arXiv Detail & Related papers (2023-01-23T19:13:25Z) - ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density.
We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z) - Sharper Convergence Guarantees for Asynchronous SGD for Distributed and
Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency.
In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! .
We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.