Gradient Descent-Ascent Provably Converges to Strict Local Minmax
Equilibria with a Finite Timescale Separation
- URL: http://arxiv.org/abs/2009.14820v1
- Date: Wed, 30 Sep 2020 17:51:28 GMT
- Title: Gradient Descent-Ascent Provably Converges to Strict Local Minmax
Equilibria with a Finite Timescale Separation
- Authors: Tanner Fiez, Lillian Ratliff
- Abstract summary: We show that a finite timescale separation parameter $tau$ has on gradient descent-ascent in non-player, non-concave zero-sum games.
We empirically demonstrate the significant impact timescale computing has on performance.
- Score: 11.091975655053547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the role that a finite timescale separation parameter $\tau$ has on
gradient descent-ascent in two-player non-convex, non-concave zero-sum games
where the learning rate of player 1 is denoted by $\gamma_1$ and the learning
rate of player 2 is defined to be $\gamma_2=\tau\gamma_1$. Existing work
analyzing the role of timescale separation in gradient descent-ascent has
primarily focused on the edge cases of players sharing a learning rate ($\tau
=1$) and the maximizing player approximately converging between each update of
the minimizing player ($\tau \rightarrow \infty$). For the parameter choice of
$\tau=1$, it is known that the learning dynamics are not guaranteed to converge
to a game-theoretically meaningful equilibria in general. In contrast, Jin et
al. (2020) showed that the stable critical points of gradient descent-ascent
coincide with the set of strict local minmax equilibria as
$\tau\rightarrow\infty$. In this work, we bridge the gap between past work by
showing there exists a finite timescale separation parameter $\tau^{\ast}$ such
that $x^{\ast}$ is a stable critical point of gradient descent-ascent for all
$\tau \in (\tau^{\ast}, \infty)$ if and only if it is a strict local minmax
equilibrium. Moreover, we provide an explicit construction for computing
$\tau^{\ast}$ along with corresponding convergence rates and results under
deterministic and stochastic gradient feedback. The convergence results we
present are complemented by a non-convergence result: given a critical point
$x^{\ast}$ that is not a strict local minmax equilibrium, then there exists a
finite timescale separation $\tau_0$ such that $x^{\ast}$ is unstable for all
$\tau\in (\tau_0, \infty)$. Finally, we empirically demonstrate on the CIFAR-10
and CelebA datasets the significant impact timescale separation has on training
performance.
Related papers
- Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit
Feedback and Unknown Transition [71.33787410075577]
We study reinforcement learning with linear function approximation, unknown transition, and adversarial losses.
We propose a new algorithm that attains an $widetildeO(dsqrtHS3K + sqrtHSAK)$ regret with high probability.
arXiv Detail & Related papers (2024-03-07T15:03:50Z) - Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits [4.811176167998627]
We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution.
Our goal is to efficiently select a single high quality arm whose average reward is, with probability $1-delta$, within $varepsilon$ of being among the top $eta$-fraction of arms.
arXiv Detail & Related papers (2023-06-03T04:00:47Z) - Local Convergence of Gradient Methods for Min-Max Games: Partial
Curvature Generically Suffices [0.5439020425819]
We study the convergence to local Nash equilibria of gradient methods for two-player zero-sum differentiable games.
We show that thanks to partial curvature, conic particle methods -- which optimize over both weights and supports of the mixed strategies -- generically converge faster than fixed-support methods.
arXiv Detail & Related papers (2023-05-26T21:43:56Z) - Depth Dependence of $\mu$P Learning Rates in ReLU MLPs [72.14317069090407]
We study the dependence on $n$ and $L$ of the maximal update ($mu$P) learning rate.
We find that it has a non-trivial dependence of $L$, scaling like $L-3/2.$
arXiv Detail & Related papers (2023-05-13T01:10:49Z) - Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD [38.221784575853796]
This work considers the problem of finding first-order stationary point of a non atau function with potentially constant smoothness using a gradient.
We develop a technique that allows us to prove $mathcalO(fracmathrmpolylog(T)sigmatT)$ convergence rates without assuming uniform bounds on the noise.
arXiv Detail & Related papers (2023-02-13T18:13:36Z) - Sharper Convergence Guarantees for Asynchronous SGD for Distributed and
Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency.
In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! .
We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z) - Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games [18.832436856339587]
This paper makes progress towards learning Nash equilibria in two-player zero-sum Markov games from offline data.
We propose a pessimistic model-based algorithm with Bernstein-style lower confidence bounds -- called VI-LCB-Game.
arXiv Detail & Related papers (2022-06-08T17:58:06Z) - Preconditioned Gradient Descent for Overparameterized Nonconvex
Burer--Monteiro Factorization with Global Optimality Certification [14.674494335647841]
We consider using gradient descent to minimize the non function $f(X)=phi(XXT)$ over an $ntimes r$ factor matrix $X$.
We propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear.
arXiv Detail & Related papers (2022-06-07T14:26:49Z) - Nearly Horizon-Free Offline Reinforcement Learning [97.36751930393245]
We revisit offline reinforcement learning on episodic time-homogeneous Markov Decision Processes with $S$ states, $A$ actions and planning horizon $H$.
We obtain the first set of nearly $H$-free sample complexity bounds for evaluation and planning using the empirical MDPs.
arXiv Detail & Related papers (2021-03-25T18:52:17Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov
Decision Processes [91.38793800392108]
We study reinforcement learning with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model.
We propose a new, computationally efficient algorithm with linear function approximation named $textUCRL-VTR+$ for the aforementioned linear mixture MDPs.
To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.
arXiv Detail & Related papers (2020-12-15T18:56:46Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.