Last-Iterate Convergence Properties of Regret-Matching Algorithms in
Games
- URL: http://arxiv.org/abs/2311.00676v1
- Date: Wed, 1 Nov 2023 17:34:58 GMT
- Title: Last-Iterate Convergence Properties of Regret-Matching Algorithms in
Games
- Authors: Yang Cai, Gabriele Farina, Julien Grand-Cl\'ement, Christian Kroer,
Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng
- Abstract summary: We study the last-iterate convergence properties of various popular variants of RM$+$.
We show numerically that several practical variants such as simultaneous RM$+$, alternating RM$+$, and predictive RM$+$, all lack last-iterate convergence guarantees.
We then prove that recent variants of these algorithms based on a smoothing technique do enjoy last-iterate convergence.
- Score: 72.43065138581589
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Algorithms based on regret matching, specifically regret matching$^+$
(RM$^+$), and its variants are the most popular approaches for solving
large-scale two-player zero-sum games in practice. Unlike algorithms such as
optimistic gradient descent ascent, which have strong last-iterate and ergodic
convergence properties for zero-sum games, virtually nothing is known about the
last-iterate properties of regret-matching algorithms. Given the importance of
last-iterate convergence for numerical optimization reasons and relevance as
modeling real-word learning in games, in this paper, we study the last-iterate
convergence properties of various popular variants of RM$^+$. First, we show
numerically that several practical variants such as simultaneous RM$^+$,
alternating RM$^+$, and simultaneous predictive RM$^+$, all lack last-iterate
convergence guarantees even on a simple $3\times 3$ game. We then prove that
recent variants of these algorithms based on a smoothing technique do enjoy
last-iterate convergence: we prove that extragradient RM$^{+}$ and smooth
Predictive RM$^+$ enjoy asymptotic last-iterate convergence (without a rate)
and $1/\sqrt{t}$ best-iterate convergence. Finally, we introduce restarted
variants of these algorithms, and show that they enjoy linear-rate last-iterate
convergence.
Related papers
- Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms [71.73971094342349]
Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games.
We show that OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix.
We prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue.
arXiv Detail & Related papers (2024-06-15T13:26:17Z) - Regret Matching+: (In)Stability and Fast Convergence in Games [68.13214224119024]
We show that RM+ and its predictive version can be unstable, which might cause other players to suffer large regret.
We show that these fixes are sufficient to get $O(T1/4)$ individual regret and $O(1)$ social regret in normal-form games via RM+ with predictions.
arXiv Detail & Related papers (2023-05-24T04:26:21Z) - Doubly Optimal No-Regret Learning in Monotone Games [10.760195409078591]
We propose the first doubly optimal no-regret learning algorithm for smooth monotone games.
Our algorithm achieves both (i) the optimal $O(sqrtT)$ regret in the adversarial setting under smooth and convex loss functions and (ii) the optimal $O(frac1T)$ last-iterate convergence rate to a Nash equilibrium.
arXiv Detail & Related papers (2023-01-30T17:55:53Z) - Policy Optimization for Markov Games: Unified Framework and Faster
Convergence [81.3266426402464]
We show that the state-wise average policy of this algorithm converges to an approximate Nash equilibrium (NE) of the game.
We extend this algorithm to multi-player general Markov Games and show a $mathcalwidetildeO(T-1/2)$ convergence rate to Correlated Equilibria (CCE)
arXiv Detail & Related papers (2022-06-06T14:23:13Z) - Near-Optimal No-Regret Learning for Correlated Equilibria in
Multi-Player General-Sum Games [104.74734408204749]
We show that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(textrmpolylog(T))$ after $T$ repetitions of the game.
We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium.
arXiv Detail & Related papers (2021-11-11T01:19:53Z) - Last-iterate Convergence in Extensive-Form Games [49.31256241275577]
We study last-iterate convergence of optimistic algorithms in sequential games.
We show that all of these algorithms enjoy last-iterate convergence, with some of them even converging exponentially fast.
arXiv Detail & Related papers (2021-06-27T22:02:26Z) - Tight last-iterate convergence rates for no-regret learning in
multi-player games [31.604950465209335]
We show that the optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of $O (1/sqrtT)$ in smooth monotone games.
We also show that the $O (1/sqrtT)$ rate is tight for all $p$-SCLI algorithms, which includes OG as a special case.
arXiv Detail & Related papers (2020-10-26T17:06:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.