Efficient Last-iterate Convergence Algorithms in Solving Games
- URL: http://arxiv.org/abs/2308.11256v1
- Date: Tue, 22 Aug 2023 07:59:49 GMT
- Title: Efficient Last-iterate Convergence Algorithms in Solving Games
- Authors: Linjian Meng, Zhenxing Ge, Wenbin Li, Bo An, Yang Gao
- Abstract summary: No-regret algorithms are popular for learning Nash equilibrium in two-player zero-sum normal-form games (NFGs) and extensive-form games (EFGs)
Recent works propose a Reward Transformation (RT) framework for MWU, which removes the uniqueness condition and achieves competitive performance with OMWU.
We show that the bottleneck of RT-based algorithms is the speed of solving convex-concave optimization problems (SCCPs)
- Score: 20.00785679098103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: No-regret algorithms are popular for learning Nash equilibrium (NE) in
two-player zero-sum normal-form games (NFGs) and extensive-form games (EFGs).
Many recent works consider the last-iterate convergence no-regret algorithms.
Among them, the two most famous algorithms are Optimistic Gradient Descent
Ascent (OGDA) and Optimistic Multiplicative Weight Update (OMWU). However, OGDA
has high per-iteration complexity. OMWU exhibits a lower per-iteration
complexity but poorer empirical performance, and its convergence holds only
when NE is unique. Recent works propose a Reward Transformation (RT) framework
for MWU, which removes the uniqueness condition and achieves competitive
performance with OMWU. Unfortunately, RT-based algorithms perform worse than
OGDA under the same number of iterations, and their convergence guarantee is
based on the continuous-time feedback assumption, which does not hold in most
scenarios. To address these issues, we provide a closer analysis of the RT
framework, which holds for both continuous and discrete-time feedback. We
demonstrate that the essence of the RT framework is to transform the problem of
learning NE in the original game into a series of strongly convex-concave
optimization problems (SCCPs). We show that the bottleneck of RT-based
algorithms is the speed of solving SCCPs. To improve the their empirical
performance, we design a novel transformation method to enable the SCCPs can be
solved by Regret Matching+ (RM+), a no-regret algorithm with better empirical
performance, resulting in Reward Transformation RM+ (RTRM+). RTRM+ enjoys
last-iterate convergence under the discrete-time feedback setting. Using the
counterfactual regret decomposition framework, we propose Reward Transformation
CFR+ (RTCFR+) to extend RTRM+ to EFGs. Experimental results show that our
algorithms significantly outperform existing last-iterate convergence
algorithms and RM+ (CFR+).
Related papers
- Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent [44.080852682765276]
This work explores minimizing weighted counterfactual regret with optimistic Online Mirror Descent (OMD)
It integrates PCFR+ and Discounted CFR (DCFR) in a principled manner, swiftly mitigating negative effects of dominated actions.
Experimental results demonstrate PDCFR+'s fast convergence in common imperfect-information games.
arXiv Detail & Related papers (2024-04-22T05:37:22Z) - Last-Iterate Convergence Properties of Regret-Matching Algorithms in
Games [72.43065138581589]
We study the last-iterate convergence properties of various popular variants of RM$+$.
We show numerically that several practical variants such as simultaneous RM$+$, alternating RM$+$, and predictive RM$+$, all lack last-iterate convergence guarantees.
We then prove that recent variants of these algorithms based on a smoothing technique do enjoy last-iterate convergence.
arXiv Detail & Related papers (2023-11-01T17:34:58Z) - Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback [75.29048190099523]
Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions.
In this paper, we design a fully adaptive OGD algorithm, textsfAdaOGD, that does not require a priori knowledge of these parameters.
arXiv Detail & Related papers (2023-10-21T18:38:13Z) - The Power of Regularization in Solving Extensive-Form Games [22.557806157585834]
We propose a series of new algorithms based on regularizing the payoff functions of the game.
In particular, we first show that dilated optimistic mirror descent (DOMD) can achieve a fast $tilde O(T)$ last-iterate convergence.
We also show that Reg-CFR, with a variant of optimistic mirror descent algorithm as regret-minimizer, can achieve $O(T1/4)$ best-iterate, and $O(T3/4)$ average-iterate convergence rate.
arXiv Detail & Related papers (2022-06-19T22:10:38Z) - HessianFR: An Efficient Hessian-based Follow-the-Ridge Algorithm for
Minimax Optimization [18.61046317693516]
HessianFR is an efficient Hessian-based Follow-the-Ridge algorithm with theoretical guarantees.
Experiments of training generative adversarial networks (GANs) have been conducted on both synthetic and real-world large-scale image datasets.
arXiv Detail & Related papers (2022-05-23T04:28:52Z) - Equivalence Analysis between Counterfactual Regret Minimization and
Online Mirror Descent [67.60077332154853]
Counterfactual Regret Minimization (CFR) is a regret minimization algorithm that minimizes the total regret by minimizing the local counterfactual regrets.
Follow-the-Regularized-Lead (FTRL) and Online Mirror Descent (OMD) algorithms are regret minimization algorithms in Online Convex Optimization.
We provide a new way to analyze and extend CFRs, by proving that CFR with Regret Matching and CFR with Regret Matching+ are special forms of FTRL and OMD.
arXiv Detail & Related papers (2021-10-11T02:12:25Z) - Last-iterate Convergence in Extensive-Form Games [49.31256241275577]
We study last-iterate convergence of optimistic algorithms in sequential games.
We show that all of these algorithms enjoy last-iterate convergence, with some of them even converging exponentially fast.
arXiv Detail & Related papers (2021-06-27T22:02:26Z) - Faster Game Solving via Predictive Blackwell Approachability: Connecting
Regret Matching and Mirror Descent [119.5481797273995]
Follow-the-regularized-leader (FTRL) and online mirror descent (OMD) are the most prevalent regret minimizers in online convex optimization.
We show that RM and RM+ are the algorithms that result from running FTRL and OMD, respectively, to select the halfspace to force at all times in the underlying Blackwell approachability game.
In experiments across 18 common zero-sum extensive-form benchmark games, we show that predictive RM+ coupled with counterfactual regret minimization converges vastly faster than the fastest prior algorithms.
arXiv Detail & Related papers (2020-07-28T16:49:55Z) - Stochastic Regret Minimization in Extensive-Form Games [109.43344748069933]
Monte-Carlo counterfactual regret minimization (MCCFR) is the state-of-the-art algorithm for solving sequential games that are too large for full trees.
We develop a new framework for developing regret minimization methods.
We show extensive experiments on three games, where some variants of our methods outperform MCCFR.
arXiv Detail & Related papers (2020-02-19T23:05:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.