SGDA with shuffling: faster convergence for nonconvex-P{\L} minimax
optimization
- URL: http://arxiv.org/abs/2210.05995v1
- Date: Wed, 12 Oct 2022 08:05:41 GMT
- Title: SGDA with shuffling: faster convergence for nonconvex-P{\L} minimax
optimization
- Authors: Hanseul Cho, Chulhee Yun
- Abstract summary: We present a theoretical approach for solving minimax optimization problems using sequentially descent-ascent gradient (SGDA)
We analyze both simultaneous and alternating SGDA-LL objectives for non-concave objectives with Polyak-Lojasiewicz (PL) geometry.
Our rates also extend to mini-batch-GDARR, recovering few known rates for full gradient-batch descent-ascent gradient (GDA)
Finally, we present a comprehensive lower bound for two-time-scale GDA, which matches the full rate for primal-PL-
- Score: 18.668531108219415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic gradient descent-ascent (SGDA) is one of the main workhorses for
solving finite-sum minimax optimization problems. Most practical
implementations of SGDA randomly reshuffle components and sequentially use them
(i.e., without-replacement sampling); however, there are few theoretical
results on this approach for minimax algorithms, especially outside the
easier-to-analyze (strongly-)monotone setups. To narrow this gap, we study the
convergence bounds of SGDA with random reshuffling (SGDA-RR) for smooth
nonconvex-nonconcave objectives with Polyak-{\L}ojasiewicz (P{\L}) geometry. We
analyze both simultaneous and alternating SGDA-RR for nonconvex-P{\L} and
primal-P{\L}-P{\L} objectives, and obtain convergence rates faster than
with-replacement SGDA. Our rates also extend to mini-batch SGDA-RR, recovering
known rates for full-batch gradient descent-ascent (GDA). Lastly, we present a
comprehensive lower bound for two-time-scale GDA, which matches the full-batch
rate for primal-P{\L}-P{\L} case.
Related papers
- Langevin Dynamics: A Unified Perspective on Optimization via Lyapunov Potentials [15.718093624695552]
We analyze the convergence of Gradient Langevin Dynamics (SGLD) to global minima based on Lyapunov potentials and optimization.
We provide 1) improved in the setting of previous works SGLD for optimization, 2) first finite gradient complexity for SGLD, and 3) prove if continuous-time Langevin Dynamics succeeds for optimization, then discrete-time SGLD succeeds under mild regularity assumptions.
arXiv Detail & Related papers (2024-07-05T05:34:10Z) - A Communication-efficient Algorithm with Linear Convergence for
Federated Minimax Learning [1.713291434132985]
We study a large-scale multi-agent minimax optimization problem, which models Geneimation Adversarial Networks (GANs)
The overall objective is a sum of agents' private local objective functions.
We show that FedGDA-GT converges linearly with a constant stepsize to global $epsilon GDA solution.
arXiv Detail & Related papers (2022-06-02T16:31:16Z) - Differentially Private SGDA for Minimax Problems [83.57322009102973]
We prove that gradient descent ascent (SGDA) can achieve optimal utility in terms of weak primal-dual population risk.
This is the first-ever-known result for non-smoothly-strongly-concave setting.
arXiv Detail & Related papers (2022-01-22T13:05:39Z) - Randomized Stochastic Gradient Descent Ascent [37.887266927498395]
An increasing number of machine learning problems, such as robust or adversarial variants of existing algorithms, require minimizing a loss function.
We propose RSGDA (Randomized SGD), a variant of ESGDA with loop size with a simpler theoretical analysis.
arXiv Detail & Related papers (2021-11-25T16:44:19Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Proximal Gradient Descent-Ascent: Variable Convergence under K{\L}
Geometry [49.65455534654459]
The finite descent-ascent parameters (GDA) has been widely applied to solve minimax optimization problems.
This paper fills such a gap by studying the convergence of the KL-Lized geometry.
arXiv Detail & Related papers (2021-02-09T05:35:53Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z) - Minibatch vs Local SGD for Heterogeneous Distributed Learning [28.80878557506603]
We argue that Minibatch SGD dominates all existing analysis of Local SGD in this setting.
We present the first upper bound for Local SGD that improves over Minibatch SGD in a non-homogeneous regime.
arXiv Detail & Related papers (2020-06-08T16:40:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.