ESCHER: Eschewing Importance Sampling in Games by Computing a History
Value Function to Estimate Regret
- URL: http://arxiv.org/abs/2206.04122v1
- Date: Wed, 8 Jun 2022 18:43:45 GMT
- Title: ESCHER: Eschewing Importance Sampling in Games by Computing a History
Value Function to Estimate Regret
- Authors: Stephen McAleer, Gabriele Farina, Marc Lanctot, Tuomas Sandholm
- Abstract summary: Recent techniques for approximating Nash equilibria in very large games leverage neural networks to learn approximately optimal policies (strategies)
DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains a neural network on an estimated regret target that can have extremely high variance due to an importance sampling term inherited from Monte Carlo CFR (MCCFR)
We show that a deep learning version of ESCHER outperforms the prior state of the art -- DREAM and neural fictitious self play (NFSP) -- and the difference becomes dramatic as game size increases.
- Score: 97.73233271730616
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent techniques for approximating Nash equilibria in very large games
leverage neural networks to learn approximately optimal policies (strategies).
One promising line of research uses neural networks to approximate
counterfactual regret minimization (CFR) or its modern variants. DREAM, the
only current CFR-based neural method that is model free and therefore scalable
to very large games, trains a neural network on an estimated regret target that
can have extremely high variance due to an importance sampling term inherited
from Monte Carlo CFR (MCCFR). In this paper we propose an unbiased model-free
method that does not require any importance sampling. Our method, ESCHER, is
principled and is guaranteed to converge to an approximate Nash equilibrium
with high probability in the tabular case. We show that the variance of the
estimated regret of a tabular version of ESCHER with an oracle value function
is significantly lower than that of outcome sampling MCCFR and tabular DREAM
with an oracle value function. We then show that a deep learning version of
ESCHER outperforms the prior state of the art -- DREAM and neural fictitious
self play (NFSP) -- and the difference becomes dramatic as game size increases.
Related papers
- Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax
Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes [7.433327915285969]
We prove the universal consistency of wide and deep ReLU neural network classifiers trained on the logistic loss.
We also give sufficient conditions for a class of probability measures for which classifiers based on neural networks achieve minimax optimal rates of convergence.
arXiv Detail & Related papers (2024-01-08T23:54:46Z) - More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory [12.689249854199982]
We show that the test risk of RF regression decreases monotonically with both the number of features and the number of samples.
We then demonstrate that, for a large class of tasks characterized by powerlaw eigenstructure, training to near-zero training loss is obligatory.
arXiv Detail & Related papers (2023-11-24T18:27:41Z) - A Kernel-Expanded Stochastic Neural Network [10.837308632004644]
Deep neural network often gets trapped into a local minimum in training.
New kernel-expanded neural network (K-StoNet) model reformulates the network as a latent variable model.
Model can be easily trained using the imputationregularized optimization (IRO) algorithm.
arXiv Detail & Related papers (2022-01-14T06:42:42Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Model-Free Online Learning in Unknown Sequential Decision Making
Problems and Games [114.90723492840499]
In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium.
We formalize an online learning setting in which the strategy space is not known to the agent.
We give an efficient algorithm that achieves $O(T3/4)$ regret with high probability for that setting, even when the agent faces an adversarial environment.
arXiv Detail & Related papers (2021-03-08T04:03:24Z) - A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via
Adversarial Fine-tuning [90.44219200633286]
We propose a simple yet very effective adversarial fine-tuning approach based on a $textitslow start, fast decay$ learning rate scheduling strategy.
Experimental results show that the proposed adversarial fine-tuning approach outperforms the state-of-the-art methods on CIFAR-10, CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-12-25T20:50:15Z) - Model-free Neural Counterfactual Regret Minimization with Bootstrap
Learning [10.816436463322237]
Current CFR algorithms have to approximate the cumulative regrets with neural networks.
A new CFR variant, Recursive CFR, is proposed, in which the cumulative regrets are recovered by Recursive Substitute Values (RSVs)
It is proved the new Recursive CFR converges to a Nash equilibrium.
Experimental results show that the new algorithm can match the state-of-the-art neural CFR algorithms but with less training overhead.
arXiv Detail & Related papers (2020-12-03T12:26:50Z) - Measurement error models: from nonparametric methods to deep neural
networks [3.1798318618973362]
We propose an efficient neural network design for estimating measurement error models.
We use a fully connected feed-forward neural network to approximate the regression function $f(x)$.
We conduct an extensive numerical study to compare the neural network approach with classical nonparametric methods.
arXiv Detail & Related papers (2020-07-15T06:05:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.