Temporal Induced Self-Play for Stochastic Bayesian Games
- URL: http://arxiv.org/abs/2108.09444v1
- Date: Sat, 21 Aug 2021 05:36:42 GMT
- Title: Temporal Induced Self-Play for Stochastic Bayesian Games
- Authors: Weizhe Chen, Zihan Zhou, Yi Wu, Fei Fang
- Abstract summary: We propose Temporal-Induced Self-Play (TISP) to find strategies with decent performances from any decision point onward.
TISP uses belief-space representation, backward induction, policy learning, and non-parametric approximation.
We prove that TISP-based algorithms can find approximate Perfect Bayesian Equilibrium in zero-sum one-sided games with finite horizon.
- Score: 32.88124137877018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One practical requirement in solving dynamic games is to ensure that the
players play well from any decision point onward. To satisfy this requirement,
existing efforts focus on equilibrium refinement, but the scalability and
applicability of existing techniques are limited. In this paper, we propose
Temporal-Induced Self-Play (TISP), a novel reinforcement learning-based
framework to find strategies with decent performances from any decision point
onward. TISP uses belief-space representation, backward induction, policy
learning, and non-parametric approximation. Building upon TISP, we design a
policy-gradient-based algorithm TISP-PG. We prove that TISP-based algorithms
can find approximate Perfect Bayesian Equilibrium in zero-sum one-sided
stochastic Bayesian games with finite horizon. We test TISP-based algorithms in
various games, including finitely repeated security games and a grid-world
game. The results show that TISP-PG is more scalable than existing mathematical
programming-based methods and significantly outperforms other learning-based
methods.
Related papers
- TSI-Bench: Benchmarking Time Series Imputation [52.27004336123575]
TSI-Bench is a comprehensive benchmark suite for time series imputation utilizing deep learning techniques.
The TSI-Bench pipeline standardizes experimental settings to enable fair evaluation of imputation algorithms.
TSI-Bench innovatively provides a systematic paradigm to tailor time series forecasting algorithms for imputation purposes.
arXiv Detail & Related papers (2024-06-18T16:07:33Z) - A unified stochastic approximation framework for learning in games [82.74514886461257]
We develop a flexible approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite)
The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, exponential/multiplicative weights for learning in finite games, optimistic and bandit variants of the above, etc.
arXiv Detail & Related papers (2022-06-08T14:30:38Z) - Scalable Deep Reinforcement Learning Algorithms for Mean Field Games [60.550128966505625]
Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents.
Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods.
Existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values.
We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm.
The second one is an online mixing method based on
arXiv Detail & Related papers (2022-03-22T18:10:32Z) - Scalable Online Planning via Reinforcement Learning Fine-Tuning [25.27878823988181]
Tabular search methods do not scale well with the size of the search space.
We replace this with online model-based fine-tuning of a policy neural network via reinforcement learning.
In particular, we use our search algorithm to achieve a new state-of-the-art result in self-play Hanabi.
arXiv Detail & Related papers (2021-09-30T17:59:11Z) - Emphatic Algorithms for Deep Reinforcement Learning [43.17171330951343]
Temporal difference learning algorithms can become unstable when combined with function approximation and off-policy sampling.
Emphatic temporal difference (ETD($lambda$) algorithm ensures convergence in the linear case by appropriately weighting the TD($lambda$) updates.
We show that naively adapting ETD($lambda$) to popular deep reinforcement learning algorithms, which use forward view multi-step returns, results in poor performance.
arXiv Detail & Related papers (2021-06-21T12:11:39Z) - Optimal control of robust team stochastic games [5.425935258756356]
We propose a model of "robust" team games, where players utilize a robust optimization approach to make decisions.
We develop a learning algorithm in the form of a Gauss-Seidel modified policy iteration and prove its convergence.
Some numerical simulations are presented to demonstrate the effectiveness of the algorithm.
arXiv Detail & Related papers (2021-05-16T10:42:09Z) - Model-Free Online Learning in Unknown Sequential Decision Making
Problems and Games [114.90723492840499]
In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium.
We formalize an online learning setting in which the strategy space is not known to the agent.
We give an efficient algorithm that achieves $O(T3/4)$ regret with high probability for that setting, even when the agent faces an adversarial environment.
arXiv Detail & Related papers (2021-03-08T04:03:24Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - Reinforcement Learning for Mean Field Games with Strategic
Complementarities [10.281006908092932]
We introduce a natural refinement to the equilibrium concept that we call Trembling-Hand-Perfect MFE (T-MFE)
We propose a simple algorithm for computing T-MFE under a known model.
We also introduce a model-free and a model-based approach to learning T-MFE and provide sample complexities of both algorithms.
arXiv Detail & Related papers (2020-06-21T00:31:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.