Outbidding and Outbluffing Elite Humans: Mastering Liar's Poker via Self-Play and Reinforcement Learning
- URL: http://arxiv.org/abs/2511.03724v2
- Date: Fri, 07 Nov 2025 16:11:08 GMT
- Title: Outbidding and Outbluffing Elite Humans: Mastering Liar's Poker via Self-Play and Reinforcement Learning
- Authors: Richard Dewey, Janos Botyanszki, Ciamac C. Moallemi, Andrew T. Zheng,
- Abstract summary: We present Solly, the first AI agent to achieve elite human play in reduced-format Liar's Poker.<n>We trained Solly using self-play with a model-free, actor-critic, deep reinforcement learning algorithm.<n>Solly played at an elite human level as measured by win rate (won over 50% of hands) and equity (money won) in heads-up and multi-player Liar's Poker.
- Score: 0.5249805590164902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AI researchers have long focused on poker-like games as a testbed for environments characterized by multi-player dynamics, imperfect information, and reasoning under uncertainty. While recent breakthroughs have matched elite human play at no-limit Texas hold'em, the multi-player dynamics are subdued: most hands converge quickly with only two players engaged through multiple rounds of bidding. In this paper, we present Solly, the first AI agent to achieve elite human play in reduced-format Liar's Poker, a game characterized by extensive multi-player engagement. We trained Solly using self-play with a model-free, actor-critic, deep reinforcement learning algorithm. Solly played at an elite human level as measured by win rate (won over 50% of hands) and equity (money won) in heads-up and multi-player Liar's Poker. Solly also outperformed large language models (LLMs), including those with reasoning abilities, on the same metrics. Solly developed novel bidding strategies, randomized play effectively, and was not easily exploitable by world-class human players.
Related papers
- AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games [63.29377274531968]
We introduce the AI GameStore, a scalable and open-ended platform to synthesize new representative human games.<n>We generate 100 such games based on the top charts of Apple App Store and Steam, and evaluate seven frontier vision-language models (VLMs) on short episodes of play.<n>The best models achieved less than 10% of the human average score on the majority of the games, and especially struggled with games that challenge world-model learning, memory and planning.
arXiv Detail & Related papers (2026-02-19T18:17:25Z) - People use fast, flat goal-directed simulation to reason about novel problems [68.55490343866545]
We show that people are systematic and adaptively rational in how they play a game for the first time.<n>We explain these capacities via a computational cognitive model that we call the "Intuitive Gamer"<n>Our work offers new insights into how people rapidly evaluate, act, and make suggestions when encountering novel problems.
arXiv Detail & Related papers (2025-10-13T15:12:08Z) - SpinGPT: A Large-Language-Model Approach to Playing Poker Correctly [2.5788559173418357]
We present SpinGPT, the first Large Language Models tailored to Spin & Go, a popular three-player online poker format.<n>Our results show that SpinGPT matches the solver's actions in 78% of decisions (tolerant accuracy)<n>These results suggest that LLMs could be a new way to deal with multi-player imperfect-information games like poker.
arXiv Detail & Related papers (2025-09-26T14:15:44Z) - PokerGPT: An End-to-End Lightweight Solver for Multi-Player Texas
Hold'em via Large Language Model [14.14786217204364]
Poker, also known as Texas Hold'em, has always been a typical research target within imperfect information games (IIGs)
We introduce PokerGPT, an end-to-end solver for playing Texas Hold'em with arbitrary number of players and gaining high win rates.
arXiv Detail & Related papers (2024-01-04T13:27:50Z) - DanZero+: Dominating the GuanDan Game through Reinforcement Learning [95.90682269990705]
We develop an AI program for an exceptionally complex and popular card game called GuanDan.
We first put forward an AI program named DanZero for this game.
In order to further enhance the AI's capabilities, we apply policy-based reinforcement learning algorithm to GuanDan.
arXiv Detail & Related papers (2023-12-05T08:07:32Z) - Guarantees for Self-Play in Multiplayer Games via Polymatrix
Decomposability [2.2636685010313364]
Self-play is a technique for machine learning in multi-agent systems where a learning algorithm learns by interacting with copies of itself.
We show that in two-player constant-sum games, self-play that reaches Nash equilibrium is guaranteed to produce strategies that perform well against any post-training opponent.
For the first time, our results identify a structural property of multiplayer games that enable performance guarantees for the strategies produced by a broad class of self-play algorithms.
arXiv Detail & Related papers (2023-10-17T18:33:21Z) - Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z) - Mastering the Game of Stratego with Model-Free Multiagent Reinforcement
Learning [86.37438204416435]
Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered.
Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome.
DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform.
arXiv Detail & Related papers (2022-06-30T15:53:19Z) - Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games [31.97631243571394]
We introduce a framework, LMAC, that automates the discovery of the update rule without explicit human design.
Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance.
We show that LMAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO.
arXiv Detail & Related papers (2021-06-04T22:30:25Z) - Suphx: Mastering Mahjong with Deep Reinforcement Learning [114.68233321904623]
We design an AI for Mahjong, named Suphx, based on deep reinforcement learning with some newly introduced techniques.
Suphx has demonstrated stronger performance than most top human players in terms of stable rank.
This is the first time that a computer program outperforms most top human players in Mahjong.
arXiv Detail & Related papers (2020-03-30T16:18:16Z) - Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games.
We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game.
We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.