Related papers: Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play

Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play

URL: http://arxiv.org/abs/2309.03084v4
Date: Sun, 27 Oct 2024 09:16:16 GMT
Title: Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play
Authors: Ju Qi, Falin Hei, Ting Feng, Dengbing Yi, Zhemei Fang, Yunfeng Luo,
Abstract summary: We introduce a new MC-based algorithm for solving imperfect information games, called MCCFVFP. MCCFVFP combines CFR's counterfactual value calculations with fictitious play's best response strategy. Results show that MCCFVFP achieved convergence speeds approximately 20%$sim$50% faster than the most advanced MCCFR variants.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Counterfactual Regret Minimization (CFR) and its variants are widely recognized as effective algorithms for solving extensive-form imperfect information games. Recently, many improvements have been focused on enhancing the convergence speed of the CFR algorithm. However, most of these variants are not applicable under Monte Carlo (MC) conditions, making them unsuitable for training in large-scale games. We introduce a new MC-based algorithm for solving extensive-form imperfect information games, called MCCFVFP (Monte Carlo Counterfactual Value-Based Fictitious Play). MCCFVFP combines CFR's counterfactual value calculations with fictitious play's best response strategy, leveraging the strengths of fictitious play to gain significant advantages in games with a high proportion of dominated strategies. Experimental results show that MCCFVFP achieved convergence speeds approximately 20\%$\sim$50\% faster than the most advanced MCCFR variants in games like poker and other test games.

Related papers

Reevaluating Policy Gradient Methods for Imperfect-Information Games [94.45878689061335]
We conduct the largest-ever exploitability comparison of DRL algorithms for imperfect-information games. Over 5600 training runs, FP, DO, and CFR-based approaches fail to outperform generic policy gradient methods.
arXiv Detail & Related papers (2025-02-13T03:38:41Z)
Rapid Learning in Constrained Minimax Games with Negative Momentum [5.086470864936883]
We introduce a novel framework for momentum buffer updating, which extends the findings of negative momentum from the unconstrained setting to the constrained setting. Experimental results on both Normal Form Games (NFGs) and Extensive Form Games (EFGs) demonstrate that our momentum techniques can significantly improve algorithm performance.
arXiv Detail & Related papers (2024-12-31T16:32:51Z)
Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent [44.080852682765276]
This work explores minimizing weighted counterfactual regret with optimistic Online Mirror Descent (OMD) It integrates PCFR+ and Discounted CFR (DCFR) in a principled manner, swiftly mitigating negative effects of dominated actions. Experimental results demonstrate PDCFR+'s fast convergence in common imperfect-information games.
arXiv Detail & Related papers (2024-04-22T05:37:22Z)
RL-CFR: Improving Action Abstraction for Imperfect Information Extensive-Form Games with Reinforcement Learning [42.80561441946148]
We introduce RL-CFR, a novel reinforcement learning (RL) approach for dynamic action abstraction. RL-CFR builds upon our innovative Markov Decision Process (MDP) formulation, with states corresponding to public information and actions represented as feature vectors indicating specific action abstractions. In experiments on Heads-up No-limit Texas Hold'em, RL-CFR outperforms ReBeL's replication and Slumbot, demonstrating significant win-rate margins of $64pm 11$ and $84pm 17$ mbb/hand, respectively.
arXiv Detail & Related papers (2024-03-07T09:12:23Z)
Accelerated Fuzzy C-Means Clustering Based on New Affinity Filtering and Membership Scaling [74.85538972921917]
Fuzzy C-Means (FCM) is a widely used clustering method. FCM has low efficiency in the mid-to-late stage of the clustering process. FCM based on new affinity filtering and membership scaling (AMFCM) is proposed to accelerate the whole convergence process.
arXiv Detail & Related papers (2023-02-14T14:20:31Z)
Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent [67.60077332154853]
Counterfactual Regret Minimization (CFR) is a regret minimization algorithm that minimizes the total regret by minimizing the local counterfactual regrets. Follow-the-Regularized-Lead (FTRL) and Online Mirror Descent (OMD) algorithms are regret minimization algorithms in Online Convex Optimization. We provide a new way to analyze and extend CFRs, by proving that CFR with Regret Matching and CFR with Regret Matching+ are special forms of FTRL and OMD.
arXiv Detail & Related papers (2021-10-11T02:12:25Z)
Last-iterate Convergence in Extensive-Form Games [49.31256241275577]
We study last-iterate convergence of optimistic algorithms in sequential games. We show that all of these algorithms enjoy last-iterate convergence, with some of them even converging exponentially fast.
arXiv Detail & Related papers (2021-06-27T22:02:26Z)
NNCFR: Minimize Counterfactual Regret with Neural Networks [4.418221583366099]
This paper introduces textitNeural Network Counterfactual Regret Minimization (NNCFR), an improved variant of textitDeep CFR. The textitNNCFR converges faster and performs more stable than textitDeep CFR, and outperforms textitDeep CFR with respect to exploitability and head-to-head performance on test games.
arXiv Detail & Related papers (2021-05-26T04:58:36Z)
Faster Game Solving via Predictive Blackwell Approachability: Connecting Regret Matching and Mirror Descent [119.5481797273995]
Follow-the-regularized-leader (FTRL) and online mirror descent (OMD) are the most prevalent regret minimizers in online convex optimization. We show that RM and RM+ are the algorithms that result from running FTRL and OMD, respectively, to select the halfspace to force at all times in the underlying Blackwell approachability game. In experiments across 18 common zero-sum extensive-form benchmark games, we show that predictive RM+ coupled with counterfactual regret minimization converges vastly faster than the fastest prior algorithms.
arXiv Detail & Related papers (2020-07-28T16:49:55Z)
Stochastic Regret Minimization in Extensive-Form Games [109.43344748069933]
Monte-Carlo counterfactual regret minimization (MCCFR) is the state-of-the-art algorithm for solving sequential games that are too large for full trees. We develop a new framework for developing regret minimization methods. We show extensive experiments on three games, where some variants of our methods outperform MCCFR.
arXiv Detail & Related papers (2020-02-19T23:05:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.