Linear Last-iterate Convergence in Constrained Saddle-point Optimization
- URL: http://arxiv.org/abs/2006.09517v3
- Date: Fri, 19 Mar 2021 18:39:55 GMT
- Title: Linear Last-iterate Convergence in Constrained Saddle-point Optimization
- Authors: Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, Haipeng Luo
- Abstract summary: We significantly expand the understanding of last-rate uniqueness for Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative Weights Update (OMWU)
We show that when the equilibrium is unique, linear lastiterate convergence is achieved with a learning rate whose value is set to a universal constant.
We show that bilinear games over any polytope satisfy this condition and OGDA converges exponentially fast even without the unique equilibrium assumption.
- Score: 48.44657553192801
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative
Weights Update (OMWU) for saddle-point optimization have received growing
attention due to their favorable last-iterate convergence. However, their
behaviors for simple bilinear games over the probability simplex are still not
fully understood - previous analysis lacks explicit convergence rates, only
applies to an exponentially small learning rate, or requires additional
assumptions such as the uniqueness of the optimal solution. In this work, we
significantly expand the understanding of last-iterate convergence for OGDA and
OMWU in the constrained setting. Specifically, for OMWU in bilinear games over
the simplex, we show that when the equilibrium is unique, linear last-iterate
convergence is achieved with a learning rate whose value is set to a universal
constant, improving the result of (Daskalakis & Panageas, 2019b) under the same
assumption. We then significantly extend the results to more general objectives
and feasible sets for the projected OGDA algorithm, by introducing a sufficient
condition under which OGDA exhibits concrete last-iterate convergence rates
with a constant learning rate whose value only depends on the smoothness of the
objective function. We show that bilinear games over any polytope satisfy this
condition and OGDA converges exponentially fast even without the unique
equilibrium assumption. Our condition also holds for
strongly-convex-strongly-concave functions, recovering the result of (Hsieh et
al., 2019). Finally, we provide experimental results to further support our
theory.
Related papers
- A Unified Analysis for Finite Weight Averaging [50.75116992029417]
Averaging iterations of Gradient Descent (SGD) have achieved empirical success in training deep learning models, such as Weight Averaging (SWA), Exponential Moving Average (EMA), and LAtest Weight Averaging (LAWA)
In this paper, we generalize LAWA as Finite Weight Averaging (FWA) and explain their advantages compared to SGD from the perspective of optimization and generalization.
arXiv Detail & Related papers (2024-11-20T10:08:22Z) - Revisiting the Last-Iterate Convergence of Stochastic Gradient Methods [25.831462008050387]
The Gradient Descent (SGD) algorithm has triggered people's interest due to its good performance in practice but lack of theoretical understanding.
It still remains unclear whether the lastiterate convergence can be provably extended to wider composite optimization and non-Euclidean norms.
arXiv Detail & Related papers (2023-12-13T21:41:06Z) - Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback [75.29048190099523]
Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions.
In this paper, we design a fully adaptive OGD algorithm, textsfAdaOGD, that does not require a priori knowledge of these parameters.
arXiv Detail & Related papers (2023-10-21T18:38:13Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - The Power of Regularization in Solving Extensive-Form Games [22.557806157585834]
We propose a series of new algorithms based on regularizing the payoff functions of the game.
In particular, we first show that dilated optimistic mirror descent (DOMD) can achieve a fast $tilde O(T)$ last-iterate convergence.
We also show that Reg-CFR, with a variant of optimistic mirror descent algorithm as regret-minimizer, can achieve $O(T1/4)$ best-iterate, and $O(T3/4)$ average-iterate convergence rate.
arXiv Detail & Related papers (2022-06-19T22:10:38Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - The Role of Momentum Parameters in the Optimal Convergence of Adaptive
Polyak's Heavy-ball Methods [12.93796690939018]
We prove that the adaptive Polyak's Heavy-ball (HB) method attains an optimal individual convergence rate of $O(frac1sqrtt)$.
Our new analysis shows how the HB momentum and its time-varying weight help us to achieve the acceleration in convex optimization.
arXiv Detail & Related papers (2021-02-15T02:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.