Variance-reduced accelerated methods for decentralized stochastic
double-regularized nonconvex strongly-concave minimax problems
- URL: http://arxiv.org/abs/2307.07113v1
- Date: Fri, 14 Jul 2023 01:32:16 GMT
- Title: Variance-reduced accelerated methods for decentralized stochastic
double-regularized nonconvex strongly-concave minimax problems
- Authors: Gabriel Mancino-Ball and Yangyang Xu
- Abstract summary: We consider a network of $m$ computing agents collaborate via peer-to-peer communications.
Our algorithmic framework introduces agrangian multiplier to eliminate the consensus constraint on the dual variable.
To the best of our knowledge, this is the first work which provides convergence guarantees for NCSC minimax problems with general non regularizers applied to both the primal and dual variables.
- Score: 7.5573375809946395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we consider the decentralized, stochastic nonconvex
strongly-concave (NCSC) minimax problem with nonsmooth regularization terms on
both primal and dual variables, wherein a network of $m$ computing agents
collaborate via peer-to-peer communications. We consider when the coupling
function is in expectation or finite-sum form and the double regularizers are
convex functions, applied separately to the primal and dual variables. Our
algorithmic framework introduces a Lagrangian multiplier to eliminate the
consensus constraint on the dual variable. Coupling this with
variance-reduction (VR) techniques, our proposed method, entitled VRLM, by a
single neighbor communication per iteration, is able to achieve an
$\mathcal{O}(\kappa^3\varepsilon^{-3})$ sample complexity under the general
stochastic setting, with either a big-batch or small-batch VR option, where
$\kappa$ is the condition number of the problem and $\varepsilon$ is the
desired solution accuracy. With a big-batch VR, we can additionally achieve
$\mathcal{O}(\kappa^2\varepsilon^{-2})$ communication complexity. Under the
special finite-sum setting, our method with a big-batch VR can achieve an
$\mathcal{O}(n + \sqrt{n} \kappa^2\varepsilon^{-2})$ sample complexity and
$\mathcal{O}(\kappa^2\varepsilon^{-2})$ communication complexity, where $n$ is
the number of components in the finite sum. All complexity results match the
best-known results achieved by a few existing methods for solving special cases
of the problem we consider. To the best of our knowledge, this is the first
work which provides convergence guarantees for NCSC minimax problems with
general convex nonsmooth regularizers applied to both the primal and dual
variables in the decentralized stochastic setting. Numerical experiments are
conducted on two machine learning problems. Our code is downloadable from
https://github.com/RPI-OPT/VRLM.
Related papers
- Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Solving a Class of Non-Convex Minimax Optimization in Federated Learning [84.98927714326908]
The minimax problems arise throughout machine learning applications, ranging from machine learning training to large-scale learning.
We propose a class of algorithms for non minimax problems (emphi) that reduce complexity to $varepsilon-6)$.
We prove that FedSGDA-M has the best sample complexity of $O(kappa2-3)$ and the best-known communication of $O(kappa2-3)$.
arXiv Detail & Related papers (2023-10-05T15:48:41Z) - Distributed Extra-gradient with Optimal Complexity and Communication
Guarantees [60.571030754252824]
We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local dual vectors.
Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-efficient.
We propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs.
arXiv Detail & Related papers (2023-08-17T21:15:04Z) - PRECISION: Decentralized Constrained Min-Max Learning with Low
Communication and Sample Complexities [25.153506493249854]
We show an adaptive multi-agent learning technique for min-max optimization problems.
We also propose an algorithm called PRECISION that enjoys a reduction in the number of iterations.
arXiv Detail & Related papers (2023-03-05T00:26:10Z) - An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum
Optimization [25.21457349137344]
We show a proof to show DEAREST requires at most $mathcal O(+sqrtmnLvarepsilon-2)$ first-order oracle (IFO) calls and $mathcal O(Lvarepsilon-2/sqrt1-lambda_W)$ communication rounds.
arXiv Detail & Related papers (2022-10-25T11:37:11Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for
Strongly Convex Decentralized Optimization [69.49313819343992]
We extend the widely used EXTRA and DIGing methods with variance reduction (VR), and propose two methods: VR-EXTRA and VR-DIGing.
The proposed VR-EXTRA requires the time of $O(kappa_s+n)logfrac1epsilon)$ gradient evaluations and $O(kappa_b+kappa_c)logfrac1epsilon)$ communication rounds.
The proposed VR-DIGing has a little higher communication cost of $O(kappa_b+kappa
arXiv Detail & Related papers (2020-09-09T15:48:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.