Revisiting Decentralized ProxSkip: Achieving Linear Speedup
- URL: http://arxiv.org/abs/2310.07983v2
- Date: Fri, 19 Apr 2024 05:21:58 GMT
- Title: Revisiting Decentralized ProxSkip: Achieving Linear Speedup
- Authors: Luyao Guo, Sulaiman A. Alghunaim, Kun Yuan, Laurent Condat, Jinde Cao,
- Abstract summary: Existing analyses of ProxSkipilon are limited convex setting and do not achieve the strongly linear speed of ProxSkipilon.
This paper shows that ProxSkipilon can achieve linear speed up and can reduce communication overhead proportional to the probability of communication.
- Score: 50.28561300894826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ProxSkip algorithm for decentralized and federated learning is gaining increasing attention due to its proven benefits in accelerating communication complexity while maintaining robustness against data heterogeneity. However, existing analyses of ProxSkip are limited to the strongly convex setting and do not achieve linear speedup, where convergence performance increases linearly with respect to the number of nodes. So far, questions remain open about how ProxSkip behaves in the non-convex setting and whether linear speedup is achievable. In this paper, we revisit decentralized ProxSkip and address both questions. We demonstrate that the leading communication complexity of ProxSkip is $\mathcal{O}\left(\frac{p\sigma^2}{n\epsilon^2}\right)$ for non-convex and convex settings, and $\mathcal{O}\left(\frac{p\sigma^2}{n\epsilon}\right)$ for the strongly convex setting, where $n$ represents the number of nodes, $p$ denotes the probability of communication, $\sigma^2$ signifies the level of stochastic noise, and $\epsilon$ denotes the desired accuracy level. This result illustrates that ProxSkip achieves linear speedup and can asymptotically reduce communication overhead proportional to the probability of communication. Additionally, for the strongly convex setting, we further prove that ProxSkip can achieve linear speedup with network-independent stepsizes.
Related papers
- Nearly Optimal Regret for Decentralized Online Convex Optimization [53.433398074919]
Decentralized online convex optimization (D-OCO) aims to minimize a sequence of global loss functions using only local computations and communications.
We develop novel D-OCO algorithms that can respectively reduce the regret bounds for convex and strongly convex functions.
Our algorithms are nearly optimal in terms of $T$, $n$, and $rho$.
arXiv Detail & Related papers (2024-02-14T13:44:16Z) - A Proximal Gradient Method With Probabilistic Multi-Gossip Communications for Decentralized Composite Optimization [36.777745196161035]
We propose a communication-efficient method MG-Skip for decentralized composite (smooth + nonsmooth) optimization.
We show that MG-Skip achieves the optimal communication complexity and confirm the benefits of local updates in the nonsmooth setup.
arXiv Detail & Related papers (2023-12-19T05:13:16Z) - High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise [59.25598762373543]
We show that wetailed high-prob convergence guarantees of learning on streaming data in the presence of heavy-tailed noise.
We demonstrate analytically and that $ta$ can be used to the preferred choice of setting for a given problem.
arXiv Detail & Related papers (2023-10-28T18:53:41Z) - Variance-reduced accelerated methods for decentralized stochastic
double-regularized nonconvex strongly-concave minimax problems [7.5573375809946395]
We consider a network of $m$ computing agents collaborate via peer-to-peer communications.
Our algorithmic framework introduces agrangian multiplier to eliminate the consensus constraint on the dual variable.
To the best of our knowledge, this is the first work which provides convergence guarantees for NCSC minimax problems with general non regularizers applied to both the primal and dual variables.
arXiv Detail & Related papers (2023-07-14T01:32:16Z) - DADAO: Decoupled Accelerated Decentralized Asynchronous Optimization [0.0]
DADAO is the first decentralized, accelerated, asynchronous, primal, first-order algorithm to minimize a sum of $L$-smooth and $mu$-strongly convex functions distributed over a given network of size $n$.
We show that our algorithm requires $mathcalO(nsqrtchisqrtfracLmulog(frac1epsilon)$ local and only $mathcalO(nsqrtchisqrtfracLmulog(
arXiv Detail & Related papers (2022-07-26T08:47:54Z) - DASHA: Distributed Nonconvex Optimization with Communication
Compression, Optimal Oracle Complexity, and No Client Synchronization [77.34726150561087]
We develop and analyze DASHA: a new family of methods for noneps distributed optimization problems.
Unlike MARINA, the new methods DASHA, DASHA-MVR send compressed vectors only and never synchronize the nodes, which makes them more practical for learning.
arXiv Detail & Related papers (2022-02-02T20:10:40Z) - Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes.
An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps.
This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z) - Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization [59.65871549878937]
We prove that the practical single loop accelerated gradient tracking needs $O(fracgamma1-sigma_gamma)2sqrtfracLepsilon)$.
Our convergence rates improve significantly over the ones of $O(frac1epsilon5/7)$ and $O(fracLmu)5/7frac1 (1-sigma)1.5logfrac1epsilon)$.
arXiv Detail & Related papers (2021-04-06T15:34:14Z) - Fully Asynchronous Policy Evaluation in Distributed Reinforcement
Learning over Networks [14.636457985379746]
This paper proposes a emphfully asynchronous scheme for the policy evaluation problem of distributed reinforcement learning (DisRL) over directed peer-to-peer networks.
Without waiting for any other node of the network, each node can locally update its value function at any time by using (possibly delayed) information from its neighbors.
arXiv Detail & Related papers (2020-03-01T08:12:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.