SpinGPT: A Large-Language-Model Approach to Playing Poker Correctly
- URL: http://arxiv.org/abs/2509.22387v1
- Date: Fri, 26 Sep 2025 14:15:44 GMT
- Title: SpinGPT: A Large-Language-Model Approach to Playing Poker Correctly
- Authors: Narada Maugin, Tristan Cazenave,
- Abstract summary: We present SpinGPT, the first Large Language Models tailored to Spin & Go, a popular three-player online poker format.<n>Our results show that SpinGPT matches the solver's actions in 78% of decisions (tolerant accuracy)<n>These results suggest that LLMs could be a new way to deal with multi-player imperfect-information games like poker.
- Score: 2.5788559173418357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Counterfactual Regret Minimization (CFR) algorithm and its variants have enabled the development of pokerbots capable of beating the best human players in heads-up (1v1) cash games and competing with them in six-player formats. However, CFR's computational complexity rises exponentially with the number of players. Furthermore, in games with three or more players, following Nash equilibrium no longer guarantees a non-losing outcome. These limitations, along with others, significantly restrict the applicability of CFR to the most popular formats: tournaments. Motivated by the recent success of Large Language Models (LLM) in chess and Diplomacy, we present SpinGPT, the first LLM tailored to Spin & Go, a popular three-player online poker format. SpinGPT is trained in two stages: (1) Supervised Fine-Tuning on 320k high-stakes expert decisions; (2) Reinforcement Learning on 270k solver-generated hands. Our results show that SpinGPT matches the solver's actions in 78% of decisions (tolerant accuracy). With a simple deep-stack heuristic, it achieves 13.4 +/- 12.9 BB/100 versus Slumbot in heads-up over 30,000 hands (95% CI). These results suggest that LLMs could be a new way to deal with multi-player imperfect-information games like poker.
Related papers
- How Far Are LLMs from Professional Poker Players? Revisiting Game-Theoretic Reasoning with Agentic Tool Use [52.394999779049606]
Large Language Models (LLMs) are increasingly applied in high-stakes domains.<n>LLMs fail to compete against traditional algorithms.<n>We propose ToolPoker, a tool-integrated reasoning framework.
arXiv Detail & Related papers (2026-01-31T05:45:25Z) - Outbidding and Outbluffing Elite Humans: Mastering Liar's Poker via Self-Play and Reinforcement Learning [0.5249805590164902]
We present Solly, the first AI agent to achieve elite human play in reduced-format Liar's Poker.<n>We trained Solly using self-play with a model-free, actor-critic, deep reinforcement learning algorithm.<n>Solly played at an elite human level as measured by win rate (won over 50% of hands) and equity (money won) in heads-up and multi-player Liar's Poker.
arXiv Detail & Related papers (2025-11-05T18:58:18Z) - Beyond Game Theory Optimal: Profit-Maximizing Poker Agents for No-Limit Holdem [0.06610877051761614]
Game-Theory- Regret Minimization (CFR) performs best in heads-up situations and CFR remains the strongest method in most multi-way situations.<n>Our approach aims to show how poker agents can move from merely not losing to consistently winning against diverse opponents.
arXiv Detail & Related papers (2025-09-28T08:51:57Z) - PokerBench: Training Large Language Models to become Professional Poker Players [3.934572858193348]
We introduce PokerBench, a benchmark for evaluating the poker-playing abilities of large language models (LLMs)<n> Poker, an incomplete information game, demands a multitude of skills such as mathematics, reasoning, planning, strategy, and a deep understanding of game theory and human psychology.<n> PokerBench consists of a comprehensive compilation of 11,000 most important scenarios, split between pre-flop and post-flop play.
arXiv Detail & Related papers (2025-01-14T18:59:03Z) - Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players.
The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile.
We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z) - PokerGPT: An End-to-End Lightweight Solver for Multi-Player Texas
Hold'em via Large Language Model [14.14786217204364]
Poker, also known as Texas Hold'em, has always been a typical research target within imperfect information games (IIGs)
We introduce PokerGPT, an end-to-end solver for playing Texas Hold'em with arbitrary number of players and gaining high win rates.
arXiv Detail & Related papers (2024-01-04T13:27:50Z) - Regret Matching+: (In)Stability and Fast Convergence in Games [68.13214224119024]
We show that RM+ and its predictive version can be unstable, which might cause other players to suffer large regret.
We show that these fixes are sufficient to get $O(T1/4)$ individual regret and $O(1)$ social regret in normal-form games via RM+ with predictions.
arXiv Detail & Related papers (2023-05-24T04:26:21Z) - Mastering Strategy Card Game (Hearthstone) with Improved Techniques [8.399453146308502]
Strategy card game is demanding on the intelligent game-play and can be an ideal test-bench for AI.
Previous work combines an end-to-end policy function and an optimistic smooth fictitious play.
In this work, we apply such algorithms to Hearthstone, a famous commercial game that is more complicated in game rules and mechanisms.
arXiv Detail & Related papers (2023-03-09T11:52:52Z) - Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z) - Mastering the Game of Stratego with Model-Free Multiagent Reinforcement
Learning [86.37438204416435]
Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered.
Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome.
DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform.
arXiv Detail & Related papers (2022-06-30T15:53:19Z) - A Unified Approach to Reinforcement Learning, Quantal Response
Equilibria, and Two-Player Zero-Sum Games [104.3339905200105]
This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm.
Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games.
arXiv Detail & Related papers (2022-06-12T19:49:14Z) - Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games [31.97631243571394]
We introduce a framework, LMAC, that automates the discovery of the update rule without explicit human design.
Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance.
We show that LMAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO.
arXiv Detail & Related papers (2021-06-04T22:30:25Z) - Generating Diverse and Competitive Play-Styles for Strategy Games [58.896302717975445]
We propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes)
We show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play.
Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.
arXiv Detail & Related papers (2021-04-17T20:33:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.