Related papers: SGDA with shuffling: faster convergence for nonconvex-P{\L} minimax optimization

SGDA with shuffling: faster convergence for nonconvex-P{\L} minimax optimization

URL: http://arxiv.org/abs/2210.05995v1
Date: Wed, 12 Oct 2022 08:05:41 GMT
Title: SGDA with shuffling: faster convergence for nonconvex-P{\L} minimax optimization
Authors: Hanseul Cho, Chulhee Yun
Abstract summary: We present a theoretical approach for solving minimax optimization problems using sequentially descent-ascent gradient (SGDA) We analyze both simultaneous and alternating SGDA-LL objectives for non-concave objectives with Polyak-Lojasiewicz (PL) geometry. Our rates also extend to mini-batch-GDARR, recovering few known rates for full gradient-batch descent-ascent gradient (GDA) Finally, we present a comprehensive lower bound for two-time-scale GDA, which matches the full rate for primal-PL-
Score: 18.668531108219415
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stochastic gradient descent-ascent (SGDA) is one of the main workhorses for solving finite-sum minimax optimization problems. Most practical implementations of SGDA randomly reshuffle components and sequentially use them (i.e., without-replacement sampling); however, there are few theoretical results on this approach for minimax algorithms, especially outside the easier-to-analyze (strongly-)monotone setups. To narrow this gap, we study the convergence bounds of SGDA with random reshuffling (SGDA-RR) for smooth nonconvex-nonconcave objectives with Polyak-{\L}ojasiewicz (P{\L}) geometry. We analyze both simultaneous and alternating SGDA-RR for nonconvex-P{\L} and primal-P{\L}-P{\L} objectives, and obtain convergence rates faster than with-replacement SGDA. Our rates also extend to mini-batch SGDA-RR, recovering known rates for full-batch gradient descent-ascent (GDA). Lastly, we present a comprehensive lower bound for two-time-scale GDA, which matches the full-batch rate for primal-P{\L}-P{\L} case.

Related papers

Armijo Line-search Makes (Stochastic) Gradient Descent Go Fast [7.974134340935598]
Armijo line-search convex (Armijo-LS) is a standard method to set the step-size for gradient descent (GD) We show that if a certain non-uniform condition, GD-LS converges provably faster than GD with a constant $1/L$ step-size.
arXiv Detail & Related papers (2025-02-28T22:26:33Z)
Langevin Dynamics: A Unified Perspective on Optimization via Lyapunov Potentials [15.718093624695552]
We analyze the convergence of Gradient Langevin Dynamics (SGLD) to global minima based on Lyapunov potentials and optimization. We provide 1) improved in the setting of previous works SGLD for optimization, 2) first finite gradient complexity for SGLD, and 3) prove if continuous-time Langevin Dynamics succeeds for optimization, then discrete-time SGLD succeeds under mild regularity assumptions.
arXiv Detail & Related papers (2024-07-05T05:34:10Z)
A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning [1.713291434132985]
We study a large-scale multi-agent minimax optimization problem, which models Geneimation Adversarial Networks (GANs) The overall objective is a sum of agents' private local objective functions. We show that FedGDA-GT converges linearly with a constant stepsize to global $epsilon GDA solution.
arXiv Detail & Related papers (2022-06-02T16:31:16Z)
Differentially Private SGDA for Minimax Problems [83.57322009102973]
We prove that gradient descent ascent (SGDA) can achieve optimal utility in terms of weak primal-dual population risk. This is the first-ever-known result for non-smoothly-strongly-concave setting.
arXiv Detail & Related papers (2022-01-22T13:05:39Z)
Randomized Stochastic Gradient Descent Ascent [37.887266927498395]
An increasing number of machine learning problems, such as robust or adversarial variants of existing algorithms, require minimizing a loss function. We propose RSGDA (Randomized SGD), a variant of ESGDA with loop size with a simpler theoretical analysis.
arXiv Detail & Related papers (2021-11-25T16:44:19Z)
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement. For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts. We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z)
Proximal Gradient Descent-Ascent: Variable Convergence under K{\L} Geometry [49.65455534654459]
The finite descent-ascent parameters (GDA) has been widely applied to solve minimax optimization problems. This paper fills such a gap by studying the convergence of the KL-Lized geometry.
arXiv Detail & Related papers (2021-02-09T05:35:53Z)
Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z)
Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error. Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z)
On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD) We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z)
Minibatch vs Local SGD for Heterogeneous Distributed Learning [28.80878557506603]
We argue that Minibatch SGD dominates all existing analysis of Local SGD in this setting. We present the first upper bound for Local SGD that improves over Minibatch SGD in a non-homogeneous regime.
arXiv Detail & Related papers (2020-06-08T16:40:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.