Model-free Neural Counterfactual Regret Minimization with Bootstrap
Learning
- URL: http://arxiv.org/abs/2012.01870v2
- Date: Sun, 9 May 2021 12:42:42 GMT
- Title: Model-free Neural Counterfactual Regret Minimization with Bootstrap
Learning
- Authors: Weiming Liu, Bin Li, Julian Togelius
- Abstract summary: Current CFR algorithms have to approximate the cumulative regrets with neural networks.
A new CFR variant, Recursive CFR, is proposed, in which the cumulative regrets are recovered by Recursive Substitute Values (RSVs)
It is proved the new Recursive CFR converges to a Nash equilibrium.
Experimental results show that the new algorithm can match the state-of-the-art neural CFR algorithms but with less training overhead.
- Score: 10.816436463322237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Counterfactual Regret Minimization (CFR) has achieved many fascinating
results in solving large-scale Imperfect Information Games (IIGs). Neural CFR
is one of the promising techniques that can effectively reduce computation and
memory consumption by generalizing decision information between similar states.
However, current neural CFR algorithms have to approximate the cumulative
regrets with neural networks. This usually results in high-variance
approximation because regrets from different iterations could be very
different. The problem can be even worse when importance sampling is used,
which is required for model-free algorithms. In this paper, a new CFR variant,
Recursive CFR, is proposed, in which the cumulative regrets are recovered by
Recursive Substitute Values (RSVs) that are recursively defined and
independently calculated between iterations. It is proved the new Recursive CFR
converges to a Nash equilibrium. Based on Recursive CFR, a new model-free
neural CFR algorithm with bootstrap learning is proposed. Experimental results
show that the new algorithm can match the state-of-the-art neural CFR
algorithms but with less training overhead.
Related papers
- Hierarchical Deep Counterfactual Regret Minimization [53.86223883060367]
In this paper, we introduce the first hierarchical version of Deep CFR, an innovative method that boosts learning efficiency in tasks involving extensively large state spaces and deep game trees.
A notable advantage of HDCFR over previous works is its ability to facilitate learning with predefined (human) expertise and foster the acquisition of skills that can be transferred to similar tasks.
arXiv Detail & Related papers (2023-05-27T02:05:41Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - ESCHER: Eschewing Importance Sampling in Games by Computing a History
Value Function to Estimate Regret [97.73233271730616]
Recent techniques for approximating Nash equilibria in very large games leverage neural networks to learn approximately optimal policies (strategies)
DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains a neural network on an estimated regret target that can have extremely high variance due to an importance sampling term inherited from Monte Carlo CFR (MCCFR)
We show that a deep learning version of ESCHER outperforms the prior state of the art -- DREAM and neural fictitious self play (NFSP) -- and the difference becomes dramatic as game size increases.
arXiv Detail & Related papers (2022-06-08T18:43:45Z) - Equivalence Analysis between Counterfactual Regret Minimization and
Online Mirror Descent [67.60077332154853]
Counterfactual Regret Minimization (CFR) is a regret minimization algorithm that minimizes the total regret by minimizing the local counterfactual regrets.
Follow-the-Regularized-Lead (FTRL) and Online Mirror Descent (OMD) algorithms are regret minimization algorithms in Online Convex Optimization.
We provide a new way to analyze and extend CFRs, by proving that CFR with Regret Matching and CFR with Regret Matching+ are special forms of FTRL and OMD.
arXiv Detail & Related papers (2021-10-11T02:12:25Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - Revisiting Recursive Least Squares for Training Deep Neural Networks [10.44340837533087]
Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence.
Previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they have high computational complexity and too many preconditions.
We propose three novel RLS optimization algorithms for training feedforward neural networks, convolutional neural networks and recurrent neural networks.
arXiv Detail & Related papers (2021-09-07T17:43:51Z) - A Novel Neural Network Training Framework with Data Assimilation [2.948167339160823]
A gradient-free training framework based on data assimilation is proposed to avoid the calculation of gradients.
The results show that the proposed training framework performed better than the gradient decent method.
arXiv Detail & Related papers (2020-10-06T11:12:23Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.