Faster Game Solving via Predictive Blackwell Approachability: Connecting
Regret Matching and Mirror Descent
- URL: http://arxiv.org/abs/2007.14358v2
- Date: Mon, 8 Mar 2021 04:15:38 GMT
- Title: Faster Game Solving via Predictive Blackwell Approachability: Connecting
Regret Matching and Mirror Descent
- Authors: Gabriele Farina and Christian Kroer and Tuomas Sandholm
- Abstract summary: Follow-the-regularized-leader (FTRL) and online mirror descent (OMD) are the most prevalent regret minimizers in online convex optimization.
We show that RM and RM+ are the algorithms that result from running FTRL and OMD, respectively, to select the halfspace to force at all times in the underlying Blackwell approachability game.
In experiments across 18 common zero-sum extensive-form benchmark games, we show that predictive RM+ coupled with counterfactual regret minimization converges vastly faster than the fastest prior algorithms.
- Score: 119.5481797273995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Blackwell approachability is a framework for reasoning about repeated games
with vector-valued payoffs. We introduce predictive Blackwell approachability,
where an estimate of the next payoff vector is given, and the decision maker
tries to achieve better performance based on the accuracy of that estimator. In
order to derive algorithms that achieve predictive Blackwell approachability,
we start by showing a powerful connection between four well-known algorithms.
Follow-the-regularized-leader (FTRL) and online mirror descent (OMD) are the
most prevalent regret minimizers in online convex optimization. In spite of
this prevalence, the regret matching (RM) and regret matching+ (RM+) algorithms
have been preferred in the practice of solving large-scale games (as the local
regret minimizers within the counterfactual regret minimization framework). We
show that RM and RM+ are the algorithms that result from running FTRL and OMD,
respectively, to select the halfspace to force at all times in the underlying
Blackwell approachability game. By applying the predictive variants of FTRL or
OMD to this connection, we obtain predictive Blackwell approachability
algorithms, as well as predictive variants of RM and RM+. In experiments across
18 common zero-sum extensive-form benchmark games, we show that predictive RM+
coupled with counterfactual regret minimization converges vastly faster than
the fastest prior algorithms (CFR+, DCFR, LCFR) across all games but two of the
poker games, sometimes by two or more orders of magnitude.
Related papers
- Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent [44.080852682765276]
This work explores minimizing weighted counterfactual regret with optimistic Online Mirror Descent (OMD)
It integrates PCFR+ and Discounted CFR (DCFR) in a principled manner, swiftly mitigating negative effects of dominated actions.
Experimental results demonstrate PDCFR+'s fast convergence in common imperfect-information games.
arXiv Detail & Related papers (2024-04-22T05:37:22Z) - Last-Iterate Convergence Properties of Regret-Matching Algorithms in
Games [72.43065138581589]
We study the last-iterate convergence properties of various popular variants of RM$+$.
We show numerically that several practical variants such as simultaneous RM$+$, alternating RM$+$, and predictive RM$+$, all lack last-iterate convergence guarantees.
We then prove that recent variants of these algorithms based on a smoothing technique do enjoy last-iterate convergence.
arXiv Detail & Related papers (2023-11-01T17:34:58Z) - Efficient Last-iterate Convergence Algorithms in Solving Games [20.00785679098103]
No-regret algorithms are popular for learning Nash equilibrium in two-player zero-sum normal-form games (NFGs) and extensive-form games (EFGs)
Recent works propose a Reward Transformation (RT) framework for MWU, which removes the uniqueness condition and achieves competitive performance with OMWU.
We show that the bottleneck of RT-based algorithms is the speed of solving convex-concave optimization problems (SCCPs)
arXiv Detail & Related papers (2023-08-22T07:59:49Z) - Regret Matching+: (In)Stability and Fast Convergence in Games [68.13214224119024]
We show that RM+ and its predictive version can be unstable, which might cause other players to suffer large regret.
We show that these fixes are sufficient to get $O(T1/4)$ individual regret and $O(1)$ social regret in normal-form games via RM+ with predictions.
arXiv Detail & Related papers (2023-05-24T04:26:21Z) - Equivalence Analysis between Counterfactual Regret Minimization and
Online Mirror Descent [67.60077332154853]
Counterfactual Regret Minimization (CFR) is a regret minimization algorithm that minimizes the total regret by minimizing the local counterfactual regrets.
Follow-the-Regularized-Lead (FTRL) and Online Mirror Descent (OMD) algorithms are regret minimization algorithms in Online Convex Optimization.
We provide a new way to analyze and extend CFRs, by proving that CFR with Regret Matching and CFR with Regret Matching+ are special forms of FTRL and OMD.
arXiv Detail & Related papers (2021-10-11T02:12:25Z) - Last-iterate Convergence in Extensive-Form Games [49.31256241275577]
We study last-iterate convergence of optimistic algorithms in sequential games.
We show that all of these algorithms enjoy last-iterate convergence, with some of them even converging exponentially fast.
arXiv Detail & Related papers (2021-06-27T22:02:26Z) - Stochastic Regret Minimization in Extensive-Form Games [109.43344748069933]
Monte-Carlo counterfactual regret minimization (MCCFR) is the state-of-the-art algorithm for solving sequential games that are too large for full trees.
We develop a new framework for developing regret minimization methods.
We show extensive experiments on three games, where some variants of our methods outperform MCCFR.
arXiv Detail & Related papers (2020-02-19T23:05:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.