Related papers: Efficient exploration of zero-sum stochastic games

Efficient exploration of zero-sum stochastic games

URL: http://arxiv.org/abs/2002.10524v1
Date: Mon, 24 Feb 2020 20:30:38 GMT
Title: Efficient exploration of zero-sum stochastic games
Authors: Carlos Martin, Tuomas Sandholm
Abstract summary: We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay. During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well. Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
Score: 83.28949556413717
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay, such as in financial or military simulations and computer games. During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well. After that, the algorithm has to produce a strategy that has low exploitability. Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly. For the stochastic game setting, we propose using the distribution of state-action value functions induced by a belief distribution over possible environments. We compare the performance of various exploration strategies for this task, including generalizations of Thompson sampling and Bayes-UCB to this new setting. These two consistently outperform other strategies.

Related papers

Variational Methods for Computing Non-Local Quantum Strategies [1.95414377613382]
In a nonlocal game, two noncommunicating players cooperate to convince a referee that they possess a strategy that does not violate the rules of the game. We show that our algorithm is capable of generating a short-depth circuit that implements an optimal quantum strategy for a graph coloring game.
arXiv Detail & Related papers (2023-11-02T16:17:18Z)
ApproxED: Approximate exploitability descent via learned best responses [61.17702187957206]
We study the problem of finding an approximate Nash equilibrium of games with continuous action sets. We propose two new methods that minimize an approximation of exploitability with respect to the strategy profile.
arXiv Detail & Related papers (2023-01-20T23:55:30Z)
Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients. We model players' strategies using artificial neural networks. This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z)
Generating Diverse and Competitive Play-Styles for Strategy Games [58.896302717975445]
We propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes) We show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play. Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.
arXiv Detail & Related papers (2021-04-17T20:33:24Z)
Collaborative Agent Gameplay in the Pandemic Board Game [3.223284371460913]
Pandemic is an exemplar collaborative board game where all players coordinate to overcome challenges posed by events occurring during the game's progression. This paper proposes an artificial agent which controls all players' actions and balances chances of winning versus risk of losing in this highly Evolutionary environment. Results show that the proposed algorithm can find winning strategies more consistently in different games of varying difficulty.
arXiv Detail & Related papers (2021-03-21T13:18:20Z)
Disturbing Reinforcement Learning Agents with Corrupted Rewards [62.997667081978825]
We analyze the effects of different attack strategies based on reward perturbations on reinforcement learning algorithms. We show that smoothly crafting adversarial rewards are able to mislead the learner, and that using low exploration probability values, the policy learned is more robust to corrupt rewards.
arXiv Detail & Related papers (2021-02-12T15:53:48Z)
On the Impossibility of Convergence of Mixed Strategies with No Regret Learning [10.515544361834241]
We study convergence properties of the mixed strategies that result from a general class of optimal no regret learning strategies. We consider the class of strategies whose information set at each step is the empirical average of the opponent's realized play.
arXiv Detail & Related papers (2020-12-03T18:02:40Z)
Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents. Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z)
Optimal strategies in the Fighting Fantasy gaming system: influencing stochastic dynamics by gambling with limited resource [0.0]
Fighting Fantasy is a popular recreational fantasy gaming system worldwide. Each round, a limited resource (luck') may be spent on a gamble to amplify the benefit from a win or mitigate the deficit from a loss. We use a backward induction approach to solve the Bellman equation for the system and identify the optimal strategy for any given state during the game.
arXiv Detail & Related papers (2020-02-24T11:31:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.