Beyond Game Theory Optimal: Profit-Maximizing Poker Agents for No-Limit Holdem
- URL: http://arxiv.org/abs/2509.23747v1
- Date: Sun, 28 Sep 2025 08:51:57 GMT
- Title: Beyond Game Theory Optimal: Profit-Maximizing Poker Agents for No-Limit Holdem
- Authors: SeungHyun Yi, Seungjun Yi,
- Abstract summary: Game-Theory- Regret Minimization (CFR) performs best in heads-up situations and CFR remains the strongest method in most multi-way situations.<n>Our approach aims to show how poker agents can move from merely not losing to consistently winning against diverse opponents.
- Score: 0.06610877051761614
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Game theory has grown into a major field over the past few decades, and poker has long served as one of its key case studies. Game-Theory-Optimal (GTO) provides strategies to avoid loss in poker, but pure GTO does not guarantee maximum profit. To this end, we aim to develop a model that outperforms GTO strategies to maximize profit in No Limit Holdem, in heads-up (two-player) and multi-way (more than two-player) situations. Our model finds the GTO foundation and goes further to exploit opponents. The model first navigates toward many simulated poker hands against itself and keeps adjusting its decisions until no action can reliably beat it, creating a strong baseline that is close to the theoretical best strategy. Then, it adapts by observing opponent behavior and adjusting its strategy to capture extra value accordingly. Our results indicate that Monte-Carlo Counterfactual Regret Minimization (CFR) performs best in heads-up situations and CFR remains the strongest method in most multi-way situations. By combining the defensive strength of GTO with real-time exploitation, our approach aims to show how poker agents can move from merely not losing to consistently winning against diverse opponents.
Related papers
- How Far Are LLMs from Professional Poker Players? Revisiting Game-Theoretic Reasoning with Agentic Tool Use [52.394999779049606]
Large Language Models (LLMs) are increasingly applied in high-stakes domains.<n>LLMs fail to compete against traditional algorithms.<n>We propose ToolPoker, a tool-integrated reasoning framework.
arXiv Detail & Related papers (2026-01-31T05:45:25Z) - SpinGPT: A Large-Language-Model Approach to Playing Poker Correctly [2.5788559173418357]
We present SpinGPT, the first Large Language Models tailored to Spin & Go, a popular three-player online poker format.<n>Our results show that SpinGPT matches the solver's actions in 78% of decisions (tolerant accuracy)<n>These results suggest that LLMs could be a new way to deal with multi-player imperfect-information games like poker.
arXiv Detail & Related papers (2025-09-26T14:15:44Z) - Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory [0.0]
We present compelling supporting evidence for Large Language Models (LLMs)<n>We conduct the first ever series of evolutionary IPD tournaments, pitting canonical strategies against agents from the leading frontier AI companies OpenAI, Google, and Anthropic.<n>Our results show that LLMs are highly competitive, consistently surviving and sometimes even proliferating in these complex ecosystems.
arXiv Detail & Related papers (2025-07-03T13:45:02Z) - A Benchmark for Generalizing Across Diverse Team Strategies in Competitive Pokémon [31.012853711707965]
Pok'emon Video Game Championships (VGC) is a domain with an extraordinarily large space of possible team configurations.<n>We introduce VGC-Bench: a benchmark that provides critical infrastructure, standardizes evaluation protocols, and supplies human-play datasets.<n>In the restricted setting where an agent is trained and evaluated on a single-team configuration, our methods are able to win against a professional VGC competitor.
arXiv Detail & Related papers (2025-06-12T03:19:39Z) - PokerBench: Training Large Language Models to become Professional Poker Players [3.934572858193348]
We introduce PokerBench, a benchmark for evaluating the poker-playing abilities of large language models (LLMs)<n> Poker, an incomplete information game, demands a multitude of skills such as mathematics, reasoning, planning, strategy, and a deep understanding of game theory and human psychology.<n> PokerBench consists of a comprehensive compilation of 11,000 most important scenarios, split between pre-flop and post-flop play.
arXiv Detail & Related papers (2025-01-14T18:59:03Z) - All by Myself: Learning Individualized Competitive Behaviour with a
Contrastive Reinforcement Learning optimization [57.615269148301515]
In a competitive game scenario, a set of agents have to learn decisions that maximize their goals and minimize their adversaries' goals at the same time.
We propose a novel model composed of three neural layers that learn a representation of a competitive game, learn how to map the strategy of specific opponents, and how to disrupt them.
Our experiments demonstrate that our model achieves better performance when playing against offline, online, and competitive-specific models, in particular when playing against the same opponent multiple times.
arXiv Detail & Related papers (2023-10-02T08:11:07Z) - Are ChatGPT and GPT-4 Good Poker Players? -- A Pre-Flop Analysis [3.4111723103928173]
We put ChatGPT and GPT-4 through the poker test and evaluate their poker skills.
Our findings reveal that while both models display an advanced understanding of poker, both ChatGPT and GPT-4 are NOT game theory optimal poker players.
arXiv Detail & Related papers (2023-08-23T23:16:35Z) - Cooperation or Competition: Avoiding Player Domination for Multi-Target
Robustness via Adaptive Budgets [76.20705291443208]
We view adversarial attacks as a bargaining game in which different players negotiate to reach an agreement on a joint direction of parameter updating.
We design a novel framework that adjusts the budgets of different adversaries to avoid any player dominance.
Experiments on standard benchmarks show that employing the proposed framework to the existing approaches significantly advances multi-target robustness.
arXiv Detail & Related papers (2023-06-27T14:02:10Z) - ApproxED: Approximate exploitability descent via learned best responses [61.17702187957206]
We study the problem of finding an approximate Nash equilibrium of games with continuous action sets.
We propose two new methods that minimize an approximation of exploitability with respect to the strategy profile.
arXiv Detail & Related papers (2023-01-20T23:55:30Z) - Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay.
During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well.
Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z) - Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games.
We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game.
We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.