Impartial Games: A Challenge for Reinforcement Learning
- URL: http://arxiv.org/abs/2205.12787v4
- Date: Sun, 14 Jan 2024 12:12:06 GMT
- Title: Impartial Games: A Challenge for Reinforcement Learning
- Authors: Bei Zhou and S{\o}ren Riis
- Abstract summary: We show that AlphaZero-style reinforcement learning algorithms face challenges on impartial games where players share pieces.
We show that Nim can be learned on small boards, but the learning progress of AlphaZero-style algorithms dramatically slows down when the board size increases.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While AlphaZero-style reinforcement learning (RL) algorithms excel in various
board games, in this paper we show that they face challenges on impartial games
where players share pieces. We present a concrete example of a game - namely
the children's game of Nim - and other impartial games that seem to be a
stumbling block for AlphaZero-style and similar self-play reinforcement
learning algorithms.
Our work is built on the challenges posed by the intricacies of data
distribution on the ability of neural networks to learn parity functions,
exacerbated by the noisy labels issue. Our findings are consistent with recent
studies showing that AlphaZero-style algorithms are vulnerable to adversarial
attacks and adversarial perturbations, showing the difficulty of learning to
master the games in all legal states.
We show that Nim can be learned on small boards, but the learning progress of
AlphaZero-style algorithms dramatically slows down when the board size
increases. Intuitively, the difference between impartial games like Nim and
partisan games like Chess and Go can be explained by the fact that if a small
part of the board is covered for impartial games it is typically not possible
to predict whether the position is won or lost as there is often zero
correlation between the visible part of a partly blanked-out position and its
correct evaluation. This situation starkly contrasts partisan games where a
partly blanked-out board position typically provides abundant or at least
non-trifle information about the value of the fully uncovered position.
Related papers
- In-Context Exploiter for Extensive-Form Games [38.24471816329584]
We introduce a novel method, In-Context Exploiter (ICE), to train a single model that can act as any player in the game and adaptively exploit opponents entirely by in-context learning.
Our ICE algorithm involves generating diverse opponent strategies, collecting interactive history data by a reinforcement learning algorithm, and training a transformer-based agent within a well-designed curriculum learning framework.
arXiv Detail & Related papers (2024-08-10T14:59:09Z) - State-Constrained Zero-Sum Differential Games with One-Sided Information [19.964883571758502]
We study zero-sum differential games with state constraints and one-sided information.
Our contribution is an extension to games with state constraints and the derivation of the primal and dual subdynamic principles necessary for computing behavioral strategies.
arXiv Detail & Related papers (2024-03-05T07:51:38Z) - Guarantees for Self-Play in Multiplayer Games via Polymatrix
Decomposability [2.2636685010313364]
Self-play is a technique for machine learning in multi-agent systems where a learning algorithm learns by interacting with copies of itself.
We show that in two-player constant-sum games, self-play that reaches Nash equilibrium is guaranteed to produce strategies that perform well against any post-training opponent.
For the first time, our results identify a structural property of multiplayer games that enable performance guarantees for the strategies produced by a broad class of self-play algorithms.
arXiv Detail & Related papers (2023-10-17T18:33:21Z) - Hardness of Independent Learning and Sparse Equilibrium Computation in
Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games.
We show that no algorithm attains no-regret in general-sum games when executed independently by all players.
We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Online Learning in Unknown Markov Games [55.07327246187741]
We study online learning in unknown Markov games.
We show that achieving sublinear regret against the best response in hindsight is statistically hard.
We present an algorithm that achieves a sublinear $tildemathcalO(K2/3)$ regret after $K$ episodes.
arXiv Detail & Related papers (2020-10-28T14:52:15Z) - Combining Deep Reinforcement Learning and Search for
Imperfect-Information Games [30.520629802135574]
We present ReBeL, a framework for self-play reinforcement learning and search provably converges to a Nash equilibrium in zero-sum games.
We also show ReBeL achieves performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.
arXiv Detail & Related papers (2020-07-27T15:21:22Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay.
During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well.
Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z) - Smooth markets: A basic mechanism for organizing gradient-based learners [47.34060971879986]
We introduce smooth markets (SM-games), a class of n-player games with pairwise zero sum interactions.
SM-games codify a common design pattern in machine learning that includes (some) GANs, adversarial training, and other recent algorithms.
We show that SM-games are amenable to analysis and optimization using first-order methods.
arXiv Detail & Related papers (2020-01-14T09:19:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.