Switch and Conquer: Efficient Algorithms By Switching Stochastic
Gradient Oracles For Decentralized Saddle Point Problems
- URL: http://arxiv.org/abs/2309.00997v2
- Date: Wed, 13 Sep 2023 16:07:28 GMT
- Title: Switch and Conquer: Efficient Algorithms By Switching Stochastic
Gradient Oracles For Decentralized Saddle Point Problems
- Authors: Chhavi Sharma, Vishnu Narayanan and P. Balamurugan
- Abstract summary: We develop an inexact primal hybrid gradient (inexact PDHG) procedure that allows generic gradient oracles to update the primal and dual variables.
We show that utilizing the best convergence phases of GSG and SVRG oracles makes C-DPSSG well suited for obtaining solutions of low/medium accuracy faster.
- Score: 1.2277343096128712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a class of non-smooth strongly convex-strongly concave saddle
point problems in a decentralized setting without a central server. To solve a
consensus formulation of problems in this class, we develop an inexact primal
dual hybrid gradient (inexact PDHG) procedure that allows generic gradient
computation oracles to update the primal and dual variables. We first
investigate the performance of inexact PDHG with stochastic variance reduction
gradient (SVRG) oracle. Our numerical study uncovers a significant phenomenon
of initial conservative progress of iterates of IPDHG with SVRG oracle. To
tackle this, we develop a simple and effective switching idea, where a
generalized stochastic gradient (GSG) computation oracle is employed to hasten
the iterates' progress to a saddle point solution during the initial phase of
updates, followed by a switch to the SVRG oracle at an appropriate juncture.
The proposed algorithm is named Decentralized Proximal Switching Stochastic
Gradient method with Compression (C-DPSSG), and is proven to converge to an
$\epsilon$-accurate saddle point solution with linear rate. Apart from
delivering highly accurate solutions, our study reveals that utilizing the best
convergence phases of GSG and SVRG oracles makes C-DPSSG well suited for
obtaining solutions of low/medium accuracy faster, useful for certain
applications. Numerical experiments on two benchmark machine learning
applications show C-DPSSG's competitive performance which validate our
theoretical findings. The codes used in the experiments can be found
\href{https://github.com/chhavisharma123/C-DPSSG-CDC2023}{here}.
Related papers
- ALEXR: An Optimal Single-Loop Algorithm for Convex Finite-Sum Coupled Compositional Stochastic Optimization [53.14532968909759]
We introduce an efficient single-loop primal-dual block-coordinate algorithm, dubbed ALEXR.
We establish the convergence rates of ALEXR in both convex and strongly convex cases under smoothness and non-smoothness conditions.
We present lower complexity bounds to demonstrate that the convergence rates of ALEXR are optimal among first-order block-coordinate algorithms for the considered class of cFCCO problems.
arXiv Detail & Related papers (2023-12-04T19:00:07Z) - DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over
Graphs [54.08445874064361]
We propose to solve a regularized distributionally robust learning problem in the decentralized setting.
By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem.
We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
arXiv Detail & Related papers (2022-08-29T18:01:42Z) - On the Convergence to a Global Solution of Shuffling-Type Gradient
Algorithms [18.663264755108703]
gradient descent (SGD) algorithm is the method of choice in many machine learning tasks.
In this paper, we show that SGD has achieved the desired computational general complexity as convex setting.
arXiv Detail & Related papers (2022-06-13T01:25:59Z) - Decentralized Stochastic Proximal Gradient Descent with Variance
Reduction over Time-varying Networks [30.231314171218994]
In decentralized learning, a network of nodes cooperate to minimize an overall objective function that is usually the finite-sum of their local objectives.
We propose a novel algorithm, namely DPSVRG, to accelerate the decentralized training by leveraging the variance reduction technique.
arXiv Detail & Related papers (2021-12-20T08:23:36Z) - COCO Denoiser: Using Co-Coercivity for Variance Reduction in Stochastic
Convex Optimization [4.970364068620608]
We exploit convexity and L-smoothness to improve the noisy estimates outputted by the gradient oracle.
We show that increasing the number and proximity of the queried points leads to better gradient estimates.
We also apply COCO in vanilla settings by plugging it in existing algorithms, such as SGD, Adam or STRSAGA.
arXiv Detail & Related papers (2021-09-07T17:21:09Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - Escaping Saddle Points with Stochastically Controlled Stochastic
Gradient Methods [12.173568611144626]
We show that a first-order saddle gradient descent (SCSG) method can be perturbed by noise or a step.
A separate step is proposed to help solve this problem.
The proposed step is designed to incorporate the proposed CNC-SCSGD method further for saddle points.
arXiv Detail & Related papers (2021-03-07T18:09:43Z) - A Retrospective Approximation Approach for Smooth Stochastic
Optimization [0.2867517731896504]
Gradient (SG) is the defactorandom iterative technique to solve optimization (SO) problems with a smooth (non-fimation) objective $imation.
arXiv Detail & Related papers (2021-03-07T16:29:36Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning.
We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems.
We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.