Deep (Predictive) Discounted Counterfactual Regret Minimization
- URL: http://arxiv.org/abs/2511.08174v1
- Date: Wed, 12 Nov 2025 01:44:52 GMT
- Title: Deep (Predictive) Discounted Counterfactual Regret Minimization
- Authors: Hang Xu, Kai Li, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng,
- Abstract summary: We propose an efficient model-free neural CFR algorithm, overcoming the limitations of existing methods in approximating advanced CFR variants.<n> Experimental results show that, compared with model-free neural algorithms, it exhibits faster convergence in typical imperfect-information games.
- Score: 47.323787598768284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games. To enhance CFR's applicability in large games, researchers use neural networks to approximate its behavior. However, existing methods are mainly based on vanilla CFR and struggle to effectively integrate more advanced CFR variants. In this work, we propose an efficient model-free neural CFR algorithm, overcoming the limitations of existing methods in approximating advanced CFR variants. At each iteration, it collects variance-reduced sampled advantages based on a value network, fits cumulative advantages by bootstrapping, and applies discounting and clipping operations to simulate the update mechanisms of advanced CFR variants. Experimental results show that, compared with model-free neural algorithms, it exhibits faster convergence in typical imperfect-information games and demonstrates stronger adversarial performance in a large poker game.
Related papers
- Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent [44.080852682765276]
This work explores minimizing weighted counterfactual regret with optimistic Online Mirror Descent (OMD)
It integrates PCFR+ and Discounted CFR (DCFR) in a principled manner, swiftly mitigating negative effects of dominated actions.
Experimental results demonstrate PDCFR+'s fast convergence in common imperfect-information games.
arXiv Detail & Related papers (2024-04-22T05:37:22Z) - Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play [0.0]
We introduce a new MC-based algorithm for solving imperfect information games, called MCCFVFP.
MCCFVFP combines CFR's counterfactual value calculations with fictitious play's best response strategy.
Results show that MCCFVFP achieved convergence speeds approximately 20%$sim$50% faster than the most advanced MCCFR variants.
arXiv Detail & Related papers (2023-09-04T09:16:49Z) - Hierarchical Deep Counterfactual Regret Minimization [53.86223883060367]
In this paper, we introduce the first hierarchical version of Deep CFR, an innovative method that boosts learning efficiency in tasks involving extensively large state spaces and deep game trees.
A notable advantage of HDCFR over previous works is its ability to facilitate learning with predefined (human) expertise and foster the acquisition of skills that can be transferred to similar tasks.
arXiv Detail & Related papers (2023-05-27T02:05:41Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - ESCHER: Eschewing Importance Sampling in Games by Computing a History
Value Function to Estimate Regret [97.73233271730616]
Recent techniques for approximating Nash equilibria in very large games leverage neural networks to learn approximately optimal policies (strategies)
DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains a neural network on an estimated regret target that can have extremely high variance due to an importance sampling term inherited from Monte Carlo CFR (MCCFR)
We show that a deep learning version of ESCHER outperforms the prior state of the art -- DREAM and neural fictitious self play (NFSP) -- and the difference becomes dramatic as game size increases.
arXiv Detail & Related papers (2022-06-08T18:43:45Z) - Large-scale Optimization of Partial AUC in a Range of False Positive
Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning.
We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique.
Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z) - Equivalence Analysis between Counterfactual Regret Minimization and
Online Mirror Descent [67.60077332154853]
Counterfactual Regret Minimization (CFR) is a regret minimization algorithm that minimizes the total regret by minimizing the local counterfactual regrets.
Follow-the-Regularized-Lead (FTRL) and Online Mirror Descent (OMD) algorithms are regret minimization algorithms in Online Convex Optimization.
We provide a new way to analyze and extend CFRs, by proving that CFR with Regret Matching and CFR with Regret Matching+ are special forms of FTRL and OMD.
arXiv Detail & Related papers (2021-10-11T02:12:25Z) - NNCFR: Minimize Counterfactual Regret with Neural Networks [4.418221583366099]
This paper introduces textitNeural Network Counterfactual Regret Minimization (NNCFR), an improved variant of textitDeep CFR.
The textitNNCFR converges faster and performs more stable than textitDeep CFR, and outperforms textitDeep CFR with respect to exploitability and head-to-head performance on test games.
arXiv Detail & Related papers (2021-05-26T04:58:36Z) - Model-free Neural Counterfactual Regret Minimization with Bootstrap
Learning [10.816436463322237]
Current CFR algorithms have to approximate the cumulative regrets with neural networks.
A new CFR variant, Recursive CFR, is proposed, in which the cumulative regrets are recovered by Recursive Substitute Values (RSVs)
It is proved the new Recursive CFR converges to a Nash equilibrium.
Experimental results show that the new algorithm can match the state-of-the-art neural CFR algorithms but with less training overhead.
arXiv Detail & Related papers (2020-12-03T12:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.