DADAO: Decoupled Accelerated Decentralized Asynchronous Optimization
- URL: http://arxiv.org/abs/2208.00779v3
- Date: Wed, 6 Dec 2023 08:39:10 GMT
- Title: DADAO: Decoupled Accelerated Decentralized Asynchronous Optimization
- Authors: Adel Nabli (MLIA, ISIR, MILA), Edouard Oyallon (MLIA, ISIR)
- Abstract summary: DADAO is the first decentralized, accelerated, asynchronous, primal, first-order algorithm to minimize a sum of $L$-smooth and $mu$-strongly convex functions distributed over a given network of size $n$.
We show that our algorithm requires $mathcalO(nsqrtchisqrtfracLmulog(frac1epsilon)$ local and only $mathcalO(nsqrtchisqrtfracLmulog(
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work introduces DADAO: the first decentralized, accelerated,
asynchronous, primal, first-order algorithm to minimize a sum of $L$-smooth and
$\mu$-strongly convex functions distributed over a given network of size $n$.
Our key insight is based on modeling the local gradient updates and gossip
communication procedures with separate independent Poisson Point Processes.
This allows us to decouple the computation and communication steps, which can
be run in parallel, while making the whole approach completely asynchronous.
This leads to communication acceleration compared to synchronous approaches.
Our new method employs primal gradients and does not use a multi-consensus
inner loop nor other ad-hoc mechanisms such as Error Feedback, Gradient
Tracking, or a Proximal operator. By relating the inverse of the smallest
positive eigenvalue of the Laplacian matrix $\chi_1$ and the maximal resistance
$\chi_2\leq \chi_1$ of the graph to a sufficient minimal communication rate
between the nodes of the network, we show that our algorithm requires
$\mathcal{O}(n\sqrt{\frac{L}{\mu}}\log(\frac{1}{\epsilon}))$ local gradients
and only
$\mathcal{O}(n\sqrt{\chi_1\chi_2}\sqrt{\frac{L}{\mu}}\log(\frac{1}{\epsilon}))$
communications to reach a precision $\epsilon$, up to logarithmic terms. Thus,
we simultaneously obtain an accelerated rate for both computations and
communications, leading to an improvement over state-of-the-art works, our
simulations further validating the strength of our relatively unconstrained
method.
Related papers
- Cooperative Multi-Agent Reinforcement Learning: Asynchronous
Communication and Linear Function Approximation [77.09836892653176]
We study multi-agent reinforcement learning in the setting of episodic Markov decision processes.
We propose a provably efficient algorithm based on value that enable asynchronous communication.
We show that a minimal $Omega(dM)$ communication complexity is required to improve the performance through collaboration.
arXiv Detail & Related papers (2023-05-10T20:29:29Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Sharper Convergence Guarantees for Asynchronous SGD for Distributed and
Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency.
In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! .
We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z) - Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes.
An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps.
This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z) - Asynchronous Stochastic Optimization Robust to Arbitrary Delays [54.61797739710608]
We consider optimization with delayed gradients where, at each time stept$, the algorithm makes an update using a stale computation - d_t$ for arbitrary delay $d_t gradient.
Our experiments demonstrate the efficacy and robustness of our algorithm in cases where the delay distribution is skewed or heavy-tailed.
arXiv Detail & Related papers (2021-06-22T15:50:45Z) - Communication-efficient SGD: From Local SGD to One-Shot Averaging [16.00658606157781]
We consider speeding up gradient descent (SGD) by parallelizing it across multiple workers.
We suggest a Local SGD scheme that communicates less overall by communicating less frequently as the number of iterations grows.
arXiv Detail & Related papers (2021-06-09T01:10:34Z) - Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization [59.65871549878937]
We prove that the practical single loop accelerated gradient tracking needs $O(fracgamma1-sigma_gamma)2sqrtfracLepsilon)$.
Our convergence rates improve significantly over the ones of $O(frac1epsilon5/7)$ and $O(fracLmu)5/7frac1 (1-sigma)1.5logfrac1epsilon)$.
arXiv Detail & Related papers (2021-04-06T15:34:14Z) - On the Benefits of Multiple Gossip Steps in Communication-Constrained
Decentralized Optimization [29.42301299741866]
We show that having $O(logfrac1epsilon)$ iterations with constant step size - $O(logfrac1epsilon)$ - enables convergence to within $epsilon$ of the optimal value for smooth non- compressed gradient objectives.
To our knowledge, this is the first work that derives the convergence results for non optimization under compressed communication compression parameters.
arXiv Detail & Related papers (2020-11-20T21:17:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.