Efficient exploration of zero-sum stochastic games
- URL: http://arxiv.org/abs/2002.10524v1
- Date: Mon, 24 Feb 2020 20:30:38 GMT
- Title: Efficient exploration of zero-sum stochastic games
- Authors: Carlos Martin, Tuomas Sandholm
- Abstract summary: We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay.
During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well.
Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
- Score: 83.28949556413717
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the increasingly important and common game-solving setting
where we do not have an explicit description of the game but only oracle access
to it through gameplay, such as in financial or military simulations and
computer games. During a limited-duration learning phase, the algorithm can
control the actions of both players in order to try to learn the game and how
to play it well. After that, the algorithm has to produce a strategy that has
low exploitability. Our motivation is to quickly learn strategies that have low
exploitability in situations where evaluating the payoffs of a queried strategy
profile is costly. For the stochastic game setting, we propose using the
distribution of state-action value functions induced by a belief distribution
over possible environments. We compare the performance of various exploration
strategies for this task, including generalizations of Thompson sampling and
Bayes-UCB to this new setting. These two consistently outperform other
strategies.
Related papers
- Variational Methods for Computing Non-Local Quantum Strategies [1.95414377613382]
In a nonlocal game, two noncommunicating players cooperate to convince a referee that they possess a strategy that does not violate the rules of the game.
We show that our algorithm is capable of generating a short-depth circuit that implements an optimal quantum strategy for a graph coloring game.
arXiv Detail & Related papers (2023-11-02T16:17:18Z) - ApproxED: Approximate exploitability descent via learned best responses [61.17702187957206]
We study the problem of finding an approximate Nash equilibrium of games with continuous action sets.
We propose two new methods that minimize an approximation of exploitability with respect to the strategy profile.
arXiv Detail & Related papers (2023-01-20T23:55:30Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Generating Diverse and Competitive Play-Styles for Strategy Games [58.896302717975445]
We propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes)
We show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play.
Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.
arXiv Detail & Related papers (2021-04-17T20:33:24Z) - Disturbing Reinforcement Learning Agents with Corrupted Rewards [62.997667081978825]
We analyze the effects of different attack strategies based on reward perturbations on reinforcement learning algorithms.
We show that smoothly crafting adversarial rewards are able to mislead the learner, and that using low exploration probability values, the policy learned is more robust to corrupt rewards.
arXiv Detail & Related papers (2021-02-12T15:53:48Z) - On the Impossibility of Convergence of Mixed Strategies with No Regret
Learning [10.515544361834241]
We study convergence properties of the mixed strategies that result from a general class of optimal no regret learning strategies.
We consider the class of strategies whose information set at each step is the empirical average of the opponent's realized play.
arXiv Detail & Related papers (2020-12-03T18:02:40Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Optimal strategies in the Fighting Fantasy gaming system: influencing
stochastic dynamics by gambling with limited resource [0.0]
Fighting Fantasy is a popular recreational fantasy gaming system worldwide.
Each round, a limited resource (luck') may be spent on a gamble to amplify the benefit from a win or mitigate the deficit from a loss.
We use a backward induction approach to solve the Bellman equation for the system and identify the optimal strategy for any given state during the game.
arXiv Detail & Related papers (2020-02-24T11:31:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.