Accelerated Gradient Tracking over Time-varying Graphs for Decentralized
Optimization
- URL: http://arxiv.org/abs/2104.02596v1
- Date: Tue, 6 Apr 2021 15:34:14 GMT
- Title: Accelerated Gradient Tracking over Time-varying Graphs for Decentralized
Optimization
- Authors: Huan Li and Zhouchen Lin
- Abstract summary: This paper revisits and extends the widely used accelerated gradient tracking.
Our complexities improve significantly on the ones of $cal O(frac1epsilon5/7)$ and $cal O(fracLmu)5/7frac1 (1-sigma)1.5logfrac1epsilon)$.
- Score: 77.57736777744934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decentralized optimization over time-varying graphs has been increasingly
common in modern machine learning with massive data stored on millions of
mobile devices, such as in federated learning. This paper revisits and extends
the widely used accelerated gradient tracking. We prove the $\cal
O(\frac{\gamma^2}{(1-\sigma_{\gamma})^2}\sqrt{\frac{L}{\epsilon}})$ and $\cal
O((\frac{\gamma}{1-\sigma_{\gamma}})^{1.5}\sqrt{\frac{L}{\mu}}\log\frac{1}{\epsilon})$
complexities for the practical single loop accelerated gradient tracking over
time-varying graphs when the problems are nonstrongly convex and strongly
convex, respectively, where $\gamma$ and $\sigma_{\gamma}$ are two common
constants charactering the network connectivity, $\epsilon$ is the desired
precision, and $L$ and $\mu$ are the smoothness and strong convexity constants,
respectively. Our complexities improve significantly on the ones of $\cal
O(\frac{1}{\epsilon^{5/7}})$ and $\cal
O((\frac{L}{\mu})^{5/7}\frac{1}{(1-\sigma)^{1.5}}\log\frac{1}{\epsilon})$
proved in the original literature only for static graph. When combining with a
multiple consensus subroutine, the dependence on the network connectivity
constants can be further improved. When the network is time-invariant, our
complexities exactly match the lower bounds without hiding any poly-logarithmic
factor for both nonstrongly convex and strongly convex problems.
Related papers
- Fast Online Node Labeling for Very Large Graphs [11.700626862639131]
Current methods either invert a graph kernel runtime matrix with $mathcalO(n3)$ or $mathcalO(n2)$ space complexity or sample a large volume of random spanning trees.
We propose an improvement based on the textitonline relaxation technique introduced by a series of works.
arXiv Detail & Related papers (2023-05-25T17:13:08Z) - ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density.
We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z) - Rates of Convergence for Regression with the Graph Poly-Laplacian [3.222802562733786]
Higher order regularity can be obtained via replacing the Laplacian regulariser with a poly-Laplacian regulariser.
We consider graph poly-Laplacian regularisation in a fully supervised, non-parametric, noise corrupted, regression problem.
arXiv Detail & Related papers (2022-09-06T08:59:15Z) - DADAO: Decoupled Accelerated Decentralized Asynchronous Optimization [0.0]
DADAO is the first decentralized, accelerated, asynchronous, primal, first-order algorithm to minimize a sum of $L$-smooth and $mu$-strongly convex functions distributed over a given network of size $n$.
We show that our algorithm requires $mathcalO(nsqrtchisqrtfracLmulog(frac1epsilon)$ local and only $mathcalO(nsqrtchisqrtfracLmulog(
arXiv Detail & Related papers (2022-07-26T08:47:54Z) - Sharper Convergence Guarantees for Asynchronous SGD for Distributed and
Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency.
In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! .
We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - Hybrid Stochastic-Deterministic Minibatch Proximal Gradient:
Less-Than-Single-Pass Optimization with Nearly Optimal Generalization [83.80460802169999]
We show that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
For loss factors, we prove that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
arXiv Detail & Related papers (2020-09-18T02:18:44Z) - Revisiting EXTRA for Smooth Distributed Optimization [70.65867695317633]
We give a sharp complexity analysis for EXTRA with the improved $Oleft(left(fracLmu+frac11-sigma_2(W)right)logfrac1epsilon (1-sigma_2(W))right)$.
Our communication complexities of the accelerated EXTRA are only worse by the factors of $left(logfracLmu (1-sigma_2(W))right)$ and $left(logfrac1epsilon (1-
arXiv Detail & Related papers (2020-02-24T08:07:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.