Stochastic Gradient Methods with Compressed Communication for
Decentralized Saddle Point Problems
- URL: http://arxiv.org/abs/2205.14452v2
- Date: Fri, 14 Apr 2023 07:14:46 GMT
- Title: Stochastic Gradient Methods with Compressed Communication for
Decentralized Saddle Point Problems
- Authors: Chhavi Sharma, Vishnu Narayanan, P. Balamurugan
- Abstract summary: We develop two compression based gradient algorithms to solve a class of non-smooth strongly convex-strongly concave saddle-point problems.
Our first algorithm is a Restart-based Decentralized Proximal Gradient method with Compression (C-RDPSG) for general settings.
Next, we present a Decentralized Proximal Variance Reduced Gradient algorithm with Compression (C-DPSVRG) for finite sum setting.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop two compression based stochastic gradient algorithms to solve a
class of non-smooth strongly convex-strongly concave saddle-point problems in a
decentralized setting (without a central server). Our first algorithm is a
Restart-based Decentralized Proximal Stochastic Gradient method with
Compression (C-RDPSG) for general stochastic settings. We provide rigorous
theoretical guarantees of C-RDPSG with gradient computation complexity and
communication complexity of order $\mathcal{O}( (1+\delta)^4
\frac{1}{L^2}{\kappa_f^2}\kappa_g^2 \frac{1}{\epsilon} )$, to achieve an
$\epsilon$-accurate saddle-point solution, where $\delta$ denotes the
compression factor, $\kappa_f$ and $\kappa_g$ denote respectively the condition
numbers of objective function and communication graph, and $L$ denotes the
smoothness parameter of the smooth part of the objective function. Next, we
present a Decentralized Proximal Stochastic Variance Reduced Gradient algorithm
with Compression (C-DPSVRG) for finite sum setting which exhibits gradient
computation complexity and communication complexity of order $\mathcal{O}
\left((1+\delta) \max \{\kappa_f^2, \sqrt{\delta}\kappa^2_f\kappa_g,\kappa_g \}
\log\left(\frac{1}{\epsilon}\right) \right)$. Extensive numerical experiments
show competitive performance of the proposed algorithms and provide support to
the theoretical results obtained.
Related papers
- Accelerating Inexact HyperGradient Descent for Bilevel Optimization [84.00488779515206]
We present a method for solving general non-strongly-concave bilevel optimization problems.
Our results also improve upon the existing complexity for finding second-order stationary points in non-strongly-concave problems.
arXiv Detail & Related papers (2023-06-30T20:36:44Z) - CEDAS: A Compressed Decentralized Stochastic Gradient Method with Improved Convergence [9.11726703830074]
In this paper, we consider solving the distributed optimization problem under the communication restricted setting.
We show the method over com-pressed exact diffusion termed CEDAS"
In particular, when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when
arXiv Detail & Related papers (2023-01-14T09:49:15Z) - A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization [53.044526424637866]
In this paper we consider finding an approximate second-order stationary point (SOSP) that minimizes a twice different subject general non conic optimization.
In particular, we propose a Newton-CG based-augmentedconjugate method for finding an approximate SOSP.
arXiv Detail & Related papers (2023-01-10T20:43:29Z) - ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density.
We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z) - An Efficient Stochastic Algorithm for Decentralized Nonconvex-Strongly-Concave Minimax Optimization [25.00475462213752]
Decentralized Recursive desc. Method (DREAM)
Concretely, it requires $mathcalO(minminappaappa3eps-3,kappa2 N)$ first-order oracle (SFO) calls and $tildemathcalO(kappa2 epsilon-2) communication rounds.
Our numerical experiments validate the superiority of previous methods.
arXiv Detail & Related papers (2022-12-05T16:09:39Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - DoCoM: Compressed Decentralized Optimization with Near-Optimal Sample
Complexity [25.775517797956237]
This paper proposes the Doubly Compressed Momentum-assisted tracking algorithm $ttDoCoM$ for communication.
We show that our algorithm outperforms several state-of-the-art algorithms in practice.
arXiv Detail & Related papers (2022-02-01T07:27:34Z) - Escaping Saddle Points with Compressed SGD [8.014396597444296]
gradient descent (SGD) is a prevalent optimization technique for large-scale distributed machine learning.
We show that SGD augmented with gradient compression converges to an $varepsilon$-first-order stationary point.
When the gradient is not Lipschitz, SGD with RandomK compressor converges to an $varepsilon$-SOSP with the same number of iterations as SGD.
arXiv Detail & Related papers (2021-05-21T01:56:43Z) - Hybrid Stochastic-Deterministic Minibatch Proximal Gradient:
Less-Than-Single-Pass Optimization with Nearly Optimal Generalization [83.80460802169999]
We show that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
For loss factors, we prove that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
arXiv Detail & Related papers (2020-09-18T02:18:44Z) - Fast decentralized non-convex finite-sum optimization with recursive
variance reduction [19.540926205375857]
We describe a first-order gradient method, called GT-SARAH, that employs a SARAH-type variance reduction technique.
In particular, in a big-data regime such that $n = O(Nfrac12(lambda)3)$, this complexitys reduces to $O(Nfrac12Lepsilon-2)$, independent of the network complexity.
In addition, we show appropriate choices of local minibatch size balance the trade-offs between gradient complexity and communication complexity.
arXiv Detail & Related papers (2020-08-17T15:51:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.