Local Convergence of Gradient Methods for Min-Max Games: Partial
Curvature Generically Suffices
- URL: http://arxiv.org/abs/2305.17275v2
- Date: Tue, 7 Nov 2023 11:26:17 GMT
- Title: Local Convergence of Gradient Methods for Min-Max Games: Partial
Curvature Generically Suffices
- Authors: Guillaume Wang, L\'ena\"ic Chizat
- Abstract summary: We study the convergence to local Nash equilibria of gradient methods for two-player zero-sum differentiable games.
We show that thanks to partial curvature, conic particle methods -- which optimize over both weights and supports of the mixed strategies -- generically converge faster than fixed-support methods.
- Score: 0.5439020425819
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the convergence to local Nash equilibria of gradient methods for
two-player zero-sum differentiable games. It is well-known that such dynamics
converge locally when $S \succ 0$ and may diverge when $S=0$, where $S\succeq
0$ is the symmetric part of the Jacobian at equilibrium that accounts for the
"potential" component of the game. We show that these dynamics also converge as
soon as $S$ is nonzero (partial curvature) and the eigenvectors of the
antisymmetric part $A$ are in general position with respect to the kernel of
$S$. We then study the convergence rates when $S \ll A$ and prove that they
typically depend on the average of the eigenvalues of $S$, instead of the
minimum as an analogy with minimization problems would suggest. To illustrate
our results, we consider the problem of computing mixed Nash equilibria of
continuous games. We show that, thanks to partial curvature, conic particle
methods -- which optimize over both weights and supports of the mixed
strategies -- generically converge faster than fixed-support methods. For
min-max games, it is thus beneficial to add degrees of freedom "with
curvature": this can be interpreted as yet another benefit of
over-parameterization.
Related papers
- On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Doubly Regularized Entropic Wasserstein Barycenters [0.0]
We study a general formulation of regularized Wasserstein barycenters that enjoys favorable regularity, approximation, stability and (grid-free) optimization properties.
arXiv Detail & Related papers (2023-03-21T13:42:43Z) - Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD [38.221784575853796]
This work considers the problem of finding first-order stationary point of a non atau function with potentially constant smoothness using a gradient.
We develop a technique that allows us to prove $mathcalO(fracmathrmpolylog(T)sigmatT)$ convergence rates without assuming uniform bounds on the noise.
arXiv Detail & Related papers (2023-02-13T18:13:36Z) - Forward Looking Best-Response Multiplicative Weights Update Methods [14.519198115835966]
A novel variant of the emphmultiplicative weights update method guarantees last-iterate convergence for emphzero-sum games with a unique emphNash equilibrium
Our findings reveal that our algorithm offers substantial gains both in terms of the convergence rate and the region of contraction relative to the previous approach.
arXiv Detail & Related papers (2021-06-07T13:03:33Z) - Estimating 2-Sinkhorn Divergence between Gaussian Processes from
Finite-Dimensional Marginals [4.416484585765028]
We study the convergence of estimating the 2-Sinkhorn divergence between emphGaussian processes (GPs) using their finite-dimensional marginal distributions.
We show almost sure convergence of the divergence when the marginals are sampled according to some base measure.
arXiv Detail & Related papers (2021-02-05T16:17:55Z) - Gradient Descent-Ascent Provably Converges to Strict Local Minmax
Equilibria with a Finite Timescale Separation [11.091975655053547]
We show that a finite timescale separation parameter $tau$ has on gradient descent-ascent in non-player, non-concave zero-sum games.
We empirically demonstrate the significant impact timescale computing has on performance.
arXiv Detail & Related papers (2020-09-30T17:51:28Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
We show a proof of convergence between the Adam Adagrad and $O(d(N)/st)$ algorithms.
Adam converges with the same convergence $O(d(N)/st)$ when used with the default parameters.
arXiv Detail & Related papers (2020-03-05T01:56:17Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.