Pure Monte Carlo Counterfactual Regret Minimization
- URL: http://arxiv.org/abs/2309.03084v3
- Date: Fri, 13 Oct 2023 06:01:17 GMT
- Title: Pure Monte Carlo Counterfactual Regret Minimization
- Authors: Ju Qi, Ting Feng, Falun Hei, Zhemei Fang, Yunfeng Luo
- Abstract summary: This paper proposes a new algorithm called Pure CFR (PCFR) based on CFR.
It can be seen as a combination of CFR and Fictitious Play (FP), inheriting the concept of counterfactual regret (value) from CFR.
It can significantly reduce the time and space complexity of one iteration.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Counterfactual Regret Minimization (CFR) and its variants are the best
algorithms so far for solving large-scale incomplete information games.
However, we believe that there are two problems with CFR: First, matrix
multiplication is required in CFR iteration, and the time complexity of one
iteration is too high; Secondly, the game characteristics in the real world are
different. Just using one CFR algorithm will not be perfectly suitable for all
game problems.
For these two problems, this paper proposes a new algorithm called Pure CFR
(PCFR) based on CFR. PCFR can be seen as a combination of CFR and Fictitious
Play (FP), inheriting the concept of counterfactual regret (value) from CFR,
and using the best response strategy instead of the regret matching strategy
for the next iteration. This algorithm has three advantages. First, PCFR can be
combined with any CFR variant. The resulting Pure MCCFR (PMCCFR) can
significantly reduce the time and space complexity of one iteration. Secondly,
our experiments show that the convergence speed of the PMCCFR is 2$\sim$3 times
that of the MCCFR. Finally, there is a type of game that is very suitable for
PCFR. We call this type of game clear-game, which is characterized by a high
proportion of dominated strategies. Experiments show that in clear-game, the
convergence rate of PMCCFR is two orders of magnitude higher than that of
MCCFR.
Related papers
- Reevaluating Policy Gradient Methods for Imperfect-Information Games [94.45878689061335]
We conduct the largest-ever exploitability comparison of DRL algorithms for imperfect-information games.
Over 5600 training runs, FP, DO, and CFR-based approaches fail to outperform generic policy gradient methods.
arXiv Detail & Related papers (2025-02-13T03:38:41Z) - Rapid Learning in Constrained Minimax Games with Negative Momentum [5.086470864936883]
We introduce a novel framework for momentum buffer updating, which extends the findings of negative momentum from the unconstrained setting to the constrained setting.
Experimental results on both Normal Form Games (NFGs) and Extensive Form Games (EFGs) demonstrate that our momentum techniques can significantly improve algorithm performance.
arXiv Detail & Related papers (2024-12-31T16:32:51Z) - Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent [44.080852682765276]
This work explores minimizing weighted counterfactual regret with optimistic Online Mirror Descent (OMD)
It integrates PCFR+ and Discounted CFR (DCFR) in a principled manner, swiftly mitigating negative effects of dominated actions.
Experimental results demonstrate PDCFR+'s fast convergence in common imperfect-information games.
arXiv Detail & Related papers (2024-04-22T05:37:22Z) - RL-CFR: Improving Action Abstraction for Imperfect Information
Extensive-Form Games with Reinforcement Learning [42.80561441946148]
We introduce RL-CFR, a novel reinforcement learning (RL) approach for dynamic action abstraction.
RL-CFR builds upon our innovative Markov Decision Process (MDP) formulation, with states corresponding to public information and actions represented as feature vectors indicating specific action abstractions.
In experiments on Heads-up No-limit Texas Hold'em, RL-CFR outperforms ReBeL's replication and Slumbot, demonstrating significant win-rate margins of $64pm 11$ and $84pm 17$ mbb/hand, respectively.
arXiv Detail & Related papers (2024-03-07T09:12:23Z) - Accelerated Fuzzy C-Means Clustering Based on New Affinity Filtering and
Membership Scaling [74.85538972921917]
Fuzzy C-Means (FCM) is a widely used clustering method.
FCM has low efficiency in the mid-to-late stage of the clustering process.
FCM based on new affinity filtering and membership scaling (AMFCM) is proposed to accelerate the whole convergence process.
arXiv Detail & Related papers (2023-02-14T14:20:31Z) - Equivalence Analysis between Counterfactual Regret Minimization and
Online Mirror Descent [67.60077332154853]
Counterfactual Regret Minimization (CFR) is a regret minimization algorithm that minimizes the total regret by minimizing the local counterfactual regrets.
Follow-the-Regularized-Lead (FTRL) and Online Mirror Descent (OMD) algorithms are regret minimization algorithms in Online Convex Optimization.
We provide a new way to analyze and extend CFRs, by proving that CFR with Regret Matching and CFR with Regret Matching+ are special forms of FTRL and OMD.
arXiv Detail & Related papers (2021-10-11T02:12:25Z) - Last-iterate Convergence in Extensive-Form Games [49.31256241275577]
We study last-iterate convergence of optimistic algorithms in sequential games.
We show that all of these algorithms enjoy last-iterate convergence, with some of them even converging exponentially fast.
arXiv Detail & Related papers (2021-06-27T22:02:26Z) - NNCFR: Minimize Counterfactual Regret with Neural Networks [4.418221583366099]
This paper introduces textitNeural Network Counterfactual Regret Minimization (NNCFR), an improved variant of textitDeep CFR.
The textitNNCFR converges faster and performs more stable than textitDeep CFR, and outperforms textitDeep CFR with respect to exploitability and head-to-head performance on test games.
arXiv Detail & Related papers (2021-05-26T04:58:36Z) - Faster Game Solving via Predictive Blackwell Approachability: Connecting
Regret Matching and Mirror Descent [119.5481797273995]
Follow-the-regularized-leader (FTRL) and online mirror descent (OMD) are the most prevalent regret minimizers in online convex optimization.
We show that RM and RM+ are the algorithms that result from running FTRL and OMD, respectively, to select the halfspace to force at all times in the underlying Blackwell approachability game.
In experiments across 18 common zero-sum extensive-form benchmark games, we show that predictive RM+ coupled with counterfactual regret minimization converges vastly faster than the fastest prior algorithms.
arXiv Detail & Related papers (2020-07-28T16:49:55Z) - Stochastic Regret Minimization in Extensive-Form Games [109.43344748069933]
Monte-Carlo counterfactual regret minimization (MCCFR) is the state-of-the-art algorithm for solving sequential games that are too large for full trees.
We develop a new framework for developing regret minimization methods.
We show extensive experiments on three games, where some variants of our methods outperform MCCFR.
arXiv Detail & Related papers (2020-02-19T23:05:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.