Learning to Play Stochastic Two-player Perfect-Information Games without
Knowledge
- URL: http://arxiv.org/abs/2302.04318v1
- Date: Wed, 8 Feb 2023 20:27:45 GMT
- Title: Learning to Play Stochastic Two-player Perfect-Information Games without
Knowledge
- Authors: Quentin Cohen-Solal and Tristan Cazenave
- Abstract summary: We extend the Descent framework, which enables learning and planning in the context of two-player games with perfect information.
We evaluate them on the game Ein wurfelt! against state-of-the-art algorithms.
It is our generalization of Descent which obtains the best results.
- Score: 5.071342645033634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we extend the Descent framework, which enables learning and
planning in the context of two-player games with perfect information, to the
framework of stochastic games.
We propose two ways of doing this, the first way generalizes the search
algorithm, i.e. Descent, to stochastic games and the second way approximates
stochastic games by deterministic games.
We then evaluate them on the game EinStein wurfelt nicht! against
state-of-the-art algorithms: Expectiminimax and Polygames (i.e. the Alpha Zero
algorithm). It is our generalization of Descent which obtains the best results.
The approximation by deterministic games nevertheless obtains good results,
presaging that it could give better results in particular contexts.
Related papers
- Generalised agent for solving higher board states of tic tac toe using
Reinforcement Learning [0.0]
This study is aimed at providing a generalized algorithm for higher board states of tic tac toe to make precise moves in a short period.
The idea is to pose the tic tac toe game as a well-posed learning problem.
The study and its results are promising, giving a high win to draw ratio with each epoch of training.
arXiv Detail & Related papers (2022-12-23T10:58:27Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Revisiting Game Representations: The Hidden Costs of Efficiency in
Sequential Decision-making Algorithms [0.6749750044497732]
Recent advancements in algorithms for sequential decision-making under imperfect information have shown remarkable success in large games.
These algorithms traditionally formalize the games using the extensive-form game formalism.
We show that a popular workaround involves using a specialized representation based on player specific information-state trees.
arXiv Detail & Related papers (2021-12-20T22:34:19Z) - Student of Games: A unified learning algorithm for both perfect and
imperfect information games [22.97853623156316]
Student of Games is an algorithm that combines guided search, self-play learning, and game-theoretic reasoning.
We prove that Student of Games is sound, converging to perfect play as available computation and approximation capacity increases.
Student of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker, and defeats the state-of-the-art agent in Scotland Yard.
arXiv Detail & Related papers (2021-12-06T17:16:24Z) - Completeness of Unbounded Best-First Game Algorithms [0.0]
We prove the completeness of two game search algorithms: best-first minimax with completion and descent with completion.
We generalize these algorithms in the context of perfect information multiplayer games.
arXiv Detail & Related papers (2021-09-11T23:13:52Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Computing Nash Equilibria in Multiplayer DAG-Structured Stochastic Games
with Persistent Imperfect Information [1.7132914341329848]
We present an algorithm for approximating Nash equilibrium in multiplayer general-suming games with persistent imperfect information.
Using a new procedure, we are able to demonstrate that our algorithm computes a strategy that closely approximates Nash equilibrium in this game.
arXiv Detail & Related papers (2020-10-26T19:27:26Z) - Faster Algorithms for Optimal Ex-Ante Coordinated Collusive Strategies
in Extensive-Form Zero-Sum Games [123.76716667704625]
We focus on the problem of finding an optimal strategy for a team of two players that faces an opponent in an imperfect-information zero-sum extensive-form game.
In that setting, it is known that the best the team can do is sample a profile of potentially randomized strategies (one per player) from a joint (a.k.a. correlated) probability distribution at the beginning of the game.
We provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile.
arXiv Detail & Related papers (2020-09-21T17:51:57Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay.
During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well.
Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z) - From Poincar\'e Recurrence to Convergence in Imperfect Information
Games: Finding Equilibrium via Regularization [49.368421783733815]
We show how adapting the reward can give strong convergence guarantees in monotone games.
We also show how this reward adaptation technique can be leveraged to build algorithms that converge exactly to the Nash equilibrium.
arXiv Detail & Related papers (2020-02-19T21:36:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.