DoCoM: Compressed Decentralized Optimization with Near-Optimal Sample
Complexity
- URL: http://arxiv.org/abs/2202.00255v2
- Date: Mon, 31 Jul 2023 04:34:34 GMT
- Title: DoCoM: Compressed Decentralized Optimization with Near-Optimal Sample
Complexity
- Authors: Chung-Yiu Yau, Hoi-To Wai
- Abstract summary: This paper proposes the Doubly Compressed Momentum-assisted tracking algorithm $ttDoCoM$ for communication.
We show that our algorithm outperforms several state-of-the-art algorithms in practice.
- Score: 25.775517797956237
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes the Doubly Compressed Momentum-assisted stochastic
gradient tracking algorithm $\texttt{DoCoM}$ for communication-efficient
decentralized optimization. The algorithm features two main ingredients to
achieve a near-optimal sample complexity while allowing for communication
compression. First, the algorithm tracks both the averaged iterate and
stochastic gradient using compressed gossiping consensus. Second, a momentum
step is incorporated for adaptive variance reduction with the local gradient
estimates. We show that $\texttt{DoCoM}$ finds a near-stationary solution at
all participating agents satisfying $\mathbb{E}[ \| \nabla f( \theta ) \|^2 ] =
\mathcal{O}( 1 / T^{2/3} )$ in $T$ iterations, where $f(\theta)$ is a smooth
(possibly non-convex) objective function. Notice that the proof is achieved via
analytically designing a new potential function that tightly tracks the
one-iteration progress of $\texttt{DoCoM}$. As a corollary, our analysis also
established the linear convergence of $\texttt{DoCoM}$ to a global optimal
solution for objective functions with the Polyak-{\L}ojasiewicz condition.
Numerical experiments demonstrate that our algorithm outperforms several
state-of-the-art algorithms in practice.
Related papers
- An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization [37.300102993926046]
We study the complexity of producing neither smooth nor convex points of Lipschitz objectives which are possibly using only zero-order evaluations.
Our analysis is based on a simple yet powerful.
Goldstein-subdifferential set, which allows recent advancements in.
nonsmooth non optimization.
arXiv Detail & Related papers (2023-07-10T11:56:04Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - A One-Sample Decentralized Proximal Algorithm for Non-Convex Stochastic
Composite Optimization [10.762749887051546]
We propose two-time scale algorithms: ProxDAS-A and Proxcal$DASA-GT.
Unlike prior work, our algorithms achieve comparable complexity without requiring large batch sizes, more complex per-it operations, or stronger assumptions.
arXiv Detail & Related papers (2023-02-20T05:16:18Z) - ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density.
We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z) - An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum
Optimization [25.21457349137344]
We show a proof to show DEAREST requires at most $mathcal O(+sqrtmnLvarepsilon-2)$ first-order oracle (IFO) calls and $mathcal O(Lvarepsilon-2/sqrt1-lambda_W)$ communication rounds.
arXiv Detail & Related papers (2022-10-25T11:37:11Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.