Related papers: Impartial Games: A Challenge for Reinforcement Learning

Impartial Games: A Challenge for Reinforcement Learning

URL: http://arxiv.org/abs/2205.12787v5
Date: Sun, 03 Aug 2025 09:36:35 GMT
Title: Impartial Games: A Challenge for Reinforcement Learning
Authors: Bei Zhou, Søren Riis,
Abstract summary: We showcase that AlphaZero-style reinforcement learning algorithms encounter significant and fundamental challenges when applied to impartial games.<n>Our findings reveal that while AlphaZero-style agents can achieve champion-level play, their learning progression severely degrades as board size increases.<n>These results align with broader concerns regarding AlphaZero-style algorithms' vulnerability to adversarial attacks.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AlphaZero-style reinforcement learning (RL) algorithms have achieved superhuman performance in many complex board games such as Chess, Shogi, and Go. However, we showcase that these algorithms encounter significant and fundamental challenges when applied to impartial games, a class where players share game pieces and optimal strategy often relies on abstract mathematical principles. Specifically, we utilize the game of Nim as a concrete and illustrative case study to reveal critical limitations of AlphaZero-style and similar self-play RL algorithms. We introduce a novel conceptual framework distinguishing between champion and expert mastery to evaluate RL agent performance. Our findings reveal that while AlphaZero-style agents can achieve champion-level play on very small Nim boards, their learning progression severely degrades as the board size increases. This difficulty stems not merely from complex data distributions or noisy labels, but from a deeper representational bottleneck: the inherent struggle of generic neural networks to implicitly learn abstract, non-associative functions like parity, which are crucial for optimal play in impartial games. This limitation causes a critical breakdown in the positive feedback loop essential for self-play RL, preventing effective learning beyond rote memorization of frequently observed states. These results align with broader concerns regarding AlphaZero-style algorithms' vulnerability to adversarial attacks, highlighting their inability to truly master all legal game states. Our work underscores that simple hyperparameter adjustments are insufficient to overcome these challenges, establishing a crucial foundation for the development of fundamentally novel algorithmic approaches, potentially involving neuro-symbolic or meta-learning paradigms, to bridge the gap towards true expert-level AI in combinatorial games.

Related papers

Mastering NIM and Impartial Games with Weak Neural Networks: An AlphaZero-inspired Multi-Frame Approach [0.0]
This paper provides a theoretical framework that validates and explains the results in the work with Bei Zhou. We show that AlphaZero-style reinforcement learning algorithms struggle to learn optimal play in NIM.
arXiv Detail & Related papers (2024-11-10T09:34:26Z)
In-Context Exploiter for Extensive-Form Games [38.24471816329584]
We introduce a novel method, In-Context Exploiter (ICE), to train a single model that can act as any player in the game and adaptively exploit opponents entirely by in-context learning. Our ICE algorithm involves generating diverse opponent strategies, collecting interactive history data by a reinforcement learning algorithm, and training a transformer-based agent within a well-designed curriculum learning framework.
arXiv Detail & Related papers (2024-08-10T14:59:09Z)
Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach [2.3020018305241337]
This paper is the first to propose considering the RRL problems within the positional differential game theory. Namely, we prove that under Isaacs's condition, the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations. We present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.
arXiv Detail & Related papers (2024-05-03T12:21:43Z)
State-Constrained Zero-Sum Differential Games with One-Sided Information [19.964883571758502]
We study zero-sum differential games with state constraints and one-sided information. Our contribution is an extension to games with state constraints and the derivation of the primal and dual subdynamic principles necessary for computing behavioral strategies.
arXiv Detail & Related papers (2024-03-05T07:51:38Z)
Exploring Parity Challenges in Reinforcement Learning through Curriculum Learning with Noisy Labels [0.0]
We propose a simulated learning process structured within a curriculum learning framework and augmented with noisy labels. This approach thoroughly analyses how neural networks (NNs) adapt and evolve from elementary to increasingly complex game positions.
arXiv Detail & Related papers (2023-12-08T21:32:39Z)
Guarantees for Self-Play in Multiplayer Games via Polymatrix Decomposability [2.2636685010313364]
Self-play is a technique for machine learning in multi-agent systems where a learning algorithm learns by interacting with copies of itself. We show that in two-player constant-sum games, self-play that reaches Nash equilibrium is guaranteed to produce strategies that perform well against any post-training opponent. For the first time, our results identify a structural property of multiplayer games that enable performance guarantees for the strategies produced by a broad class of self-play algorithms.
arXiv Detail & Related papers (2023-10-17T18:33:21Z)
Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games. We show that no algorithm attains no-regret in general-sum games when executed independently by all players. We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z)
Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients. We model players' strategies using artificial neural networks. This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z)
No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning. We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures. Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z)
Learning Generative Deception Strategies in Combinatorial Masking Games [27.2744631811653]
One way deception can be employed is through obscuring, or masking, some of the information about how systems are configured. We present a novel game-theoretic model of the resulting defender-attacker interaction, where the defender chooses a subset of attributes to mask, while the attacker responds by choosing an exploit to execute. We present a novel highly scalable approach for approximately solving such games by representing the strategies of both players as neural networks.
arXiv Detail & Related papers (2021-09-23T20:42:44Z)
Generating Diverse and Competitive Play-Styles for Strategy Games [58.896302717975445]
We propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes) We show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play. Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.
arXiv Detail & Related papers (2021-04-17T20:33:24Z)
Disturbing Reinforcement Learning Agents with Corrupted Rewards [62.997667081978825]
We analyze the effects of different attack strategies based on reward perturbations on reinforcement learning algorithms. We show that smoothly crafting adversarial rewards are able to mislead the learner, and that using low exploration probability values, the policy learned is more robust to corrupt rewards.
arXiv Detail & Related papers (2021-02-12T15:53:48Z)
Online Learning in Unknown Markov Games [55.07327246187741]
We study online learning in unknown Markov games. We show that achieving sublinear regret against the best response in hindsight is statistically hard. We present an algorithm that achieves a sublinear $tildemathcalO(K2/3)$ regret after $K$ episodes.
arXiv Detail & Related papers (2020-10-28T14:52:15Z)
Combining Deep Reinforcement Learning and Search for Imperfect-Information Games [30.520629802135574]
We present ReBeL, a framework for self-play reinforcement learning and search provably converges to a Nash equilibrium in zero-sum games. We also show ReBeL achieves performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.
arXiv Detail & Related papers (2020-07-27T15:21:22Z)
Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents. Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z)
Warm-Start AlphaZero Self-Play Search Enhancements [5.096685900776467]
Recently, AlphaZero has achieved landmark results in deep reinforcement learning. We propose a novel approach to deal with this cold-start problem by employing simple search enhancements. Our experiments indicate that most of these enhancements improve the performance of their baseline player in three different (small) board games.
arXiv Detail & Related papers (2020-04-26T11:48:53Z)
Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay. During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well. Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z)
Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games. We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game. We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)
Smooth markets: A basic mechanism for organizing gradient-based learners [47.34060971879986]
We introduce smooth markets (SM-games), a class of n-player games with pairwise zero sum interactions. SM-games codify a common design pattern in machine learning that includes (some) GANs, adversarial training, and other recent algorithms. We show that SM-games are amenable to analysis and optimization using first-order methods.
arXiv Detail & Related papers (2020-01-14T09:19:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.