Related papers: Learning to Play Two-Player Perfect-Information Games without Knowledge

Learning to Play Two-Player Perfect-Information Games without Knowledge

URL: http://arxiv.org/abs/2008.01188v4
Date: Mon, 31 Mar 2025 17:50:04 GMT
Title: Learning to Play Two-Player Perfect-Information Games without Knowledge
Authors: Quentin Cohen-Solal,
Abstract summary: In this paper, several techniques for learning game state evaluation functions by reinforcement are proposed.<n>The first is a generalization of tree bootstrapping (tree learning): it is adapted to the context of reinforcement learning without knowledge.<n>The second is a modification of minimax with unbounded depth extending the best sequences of actions to the terminal states.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, several techniques for learning game state evaluation functions by reinforcement are proposed. The first is a generalization of tree bootstrapping (tree learning): it is adapted to the context of reinforcement learning without knowledge based on non-linear functions. With this technique, no information is lost during the reinforcement learning process. The second is a modification of minimax with unbounded depth extending the best sequences of actions to the terminal states. This modified search is intended to be used during the learning process. The third is to replace the classic gain of a game (+1 / -1) with a reinforcement heuristic. We study particular reinforcement heuristics such as: quick wins and slow defeats ; scoring ; mobility or presence. The four is a new action selection distribution. The conducted experiments suggest that these techniques improve the level of play. Finally, we apply these different techniques to design program-players to the game of Hex (size 11 and 13) surpassing the level of Mohex 3HNN with reinforcement learning from self-play without knowledge.

Related papers

In-Context Exploiter for Extensive-Form Games [38.24471816329584]
We introduce a novel method, In-Context Exploiter (ICE), to train a single model that can act as any player in the game and adaptively exploit opponents entirely by in-context learning. Our ICE algorithm involves generating diverse opponent strategies, collecting interactive history data by a reinforcement learning algorithm, and training a transformer-based agent within a well-designed curriculum learning framework.
arXiv Detail & Related papers (2024-08-10T14:59:09Z)
Accelerating Inverse Reinforcement Learning with Expert Bootstrapping [13.391861125428234]
We show that better utilization of expert demonstrations can reduce the need for hard exploration in the inner RL loop. Specifically, we propose two simple recipes: (1) placing expert transitions into the replay buffer of the inner RL algorithm (e.g. Soft-Actor Critic) which directly informs the learner about high reward states instead of forcing the learner to discover them through extensive exploration, and (2) using expert actions in Q value bootstrapping to improve the target Q value estimates and more accurately describe high value expert states.
arXiv Detail & Related papers (2024-02-04T20:49:53Z)
ApproxED: Approximate exploitability descent via learned best responses [61.17702187957206]
We study the problem of finding an approximate Nash equilibrium of games with continuous action sets. We propose two new methods that minimize an approximation of exploitability with respect to the strategy profile.
arXiv Detail & Related papers (2023-01-20T23:55:30Z)
A Ranking Game for Imitation Learning [22.028680861819215]
We treat imitation as a two-player ranking-based Stackelberg game between a $textitpolicy$ and a $textitreward$ function. This game encompasses a large subset of both inverse reinforcement learning (IRL) methods and methods which learn from offline preferences. We theoretically analyze the requirements of the loss function used for ranking policy performances to facilitate near-optimal imitation learning at equilibrium.
arXiv Detail & Related papers (2022-02-07T19:38:22Z)
No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning. We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures. Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z)
Hierarchical Skills for Efficient Exploration [70.62309286348057]
In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration. Prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design. We propose a hierarchical skill learning framework that acquires skills of varying complexity in an unsupervised manner.
arXiv Detail & Related papers (2021-10-20T22:29:32Z)
Learning to Play Soccer From Scratch: Sample-Efficient Emergent Coordination through Curriculum-Learning and Competition [1.675857332621569]
This work proposes a scheme that allows learning complex multi-agent behaviors in a sample efficient manner, applied to 2v2 soccer. The problem is formulated as a Markov game, and solved using deep reinforcement learning. Our results show that high quality soccer play can be obtained with our approach in just under 40M interactions.
arXiv Detail & Related papers (2021-03-09T01:57:16Z)
Reinforcement Learning with Dual-Observation for General Video Game Playing [12.33685708449853]
General Video Game AI Learning Competition aims to develop agents capable of learning to play different game levels unseen during training. This paper summarises the five years' General Video Game AI Learning Competition editions. We present a novel reinforcement learning technique with dual-observation for general video game playing.
arXiv Detail & Related papers (2020-11-11T08:28:20Z)
Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents. Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z)
Learning from Learners: Adapting Reinforcement Learning Agents to be Competitive in a Card Game [71.24825724518847]
We present a study on how popular reinforcement learning algorithms can be adapted to learn and to play a real-world implementation of a competitive multiplayer card game. We propose specific training and validation routines for the learning agents, in order to evaluate how the agents learn to be competitive and explain how they adapt to each others' playing style.
arXiv Detail & Related papers (2020-04-08T14:11:05Z)
Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay. During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well. Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z)
Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games. We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game. We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.