Related papers: PokerBench: Training Large Language Models to become Professional Poker Players

PokerBench: Training Large Language Models to become Professional Poker Players

URL: http://arxiv.org/abs/2501.08328v2
Date: Fri, 24 Jan 2025 20:15:10 GMT
Title: PokerBench: Training Large Language Models to become Professional Poker Players
Authors: Richard Zhuang, Akshat Gupta, Richard Yang, Aniket Rahane, Zhengyu Li, Gopala Anumanchipalli,
Abstract summary: We introduce PokerBench, a benchmark for evaluating the poker-playing abilities of large language models (LLMs)<n> Poker, an incomplete information game, demands a multitude of skills such as mathematics, reasoning, planning, strategy, and a deep understanding of game theory and human psychology.<n> PokerBench consists of a comprehensive compilation of 11,000 most important scenarios, split between pre-flop and post-flop play.
Score: 3.934572858193348
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce PokerBench - a benchmark for evaluating the poker-playing abilities of large language models (LLMs). As LLMs excel in traditional NLP tasks, their application to complex, strategic games like poker poses a new challenge. Poker, an incomplete information game, demands a multitude of skills such as mathematics, reasoning, planning, strategy, and a deep understanding of game theory and human psychology. This makes Poker the ideal next frontier for large language models. PokerBench consists of a comprehensive compilation of 11,000 most important scenarios, split between pre-flop and post-flop play, developed in collaboration with trained poker players. We evaluate prominent models including GPT-4, ChatGPT 3.5, and various Llama and Gemma series models, finding that all state-of-the-art LLMs underperform in playing optimal poker. However, after fine-tuning, these models show marked improvements. We validate PokerBench by having models with different scores compete with each other, demonstrating that higher scores on PokerBench lead to higher win rates in actual poker games. Through gameplay between our fine-tuned model and GPT-4, we also identify limitations of simple supervised fine-tuning for learning optimal playing strategy, suggesting the need for more advanced methodologies for effectively training language models to excel in games. PokerBench thus presents a unique benchmark for a quick and reliable evaluation of the poker-playing ability of LLMs as well as a comprehensive benchmark to study the progress of LLMs in complex game-playing scenarios.

Related papers

Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees [91.88803125231189]
Multi-step Preference Optimization (MPO) is built upon the natural actor-critic frameworkciteprakhlin2013online,joulani17a. We show that OMPO requires $mathcalO(epsilon-1)$ policy updates to converge to an $epsilon$-approximate Nash equilibrium. We also validate the effectiveness of our method on multi-turn conversations dataset and math reasoning dataset.
arXiv Detail & Related papers (2025-02-18T09:33:48Z)
Instruction-Driven Game Engine: A Poker Case Study [53.689520884467065]
The IDGE project aims to democratize game development by enabling a large language model to follow free-form game descriptions and generate game-play processes. We train the IDGE in a curriculum manner that progressively increases its exposure to complex scenarios. Our initial progress lies in developing an IDGE for Poker, which not only supports a wide range of poker variants but also allows for highly individualized new poker games through natural language inputs.
arXiv Detail & Related papers (2024-10-17T11:16:27Z)
Instruction-Driven Game Engines on Large Language Models [59.280666591243154]
The IDGE project aims to democratize game development by enabling a large language model to follow free-form game rules. We train the IDGE in a curriculum manner that progressively increases the model's exposure to complex scenarios. Our initial progress lies in developing an IDGE for Poker, a universally cherished card game.
arXiv Detail & Related papers (2024-03-30T08:02:16Z)
PokerGPT: An End-to-End Lightweight Solver for Multi-Player Texas Hold'em via Large Language Model [14.14786217204364]
Poker, also known as Texas Hold'em, has always been a typical research target within imperfect information games (IIGs) We introduce PokerGPT, an end-to-end solver for playing Texas Hold'em with arbitrary number of players and gaining high win rates.
arXiv Detail & Related papers (2024-01-04T13:27:50Z)
A Survey on Game Theory Optimal Poker [0.0]
No non-trivial imperfect information game has been solved to date. This makes poker a great test bed for Artificial Intelligence research. We discuss the intricacies of abstraction techniques, betting models, and specific strategies employed by successful poker bots.
arXiv Detail & Related papers (2024-01-02T04:19:25Z)
All by Myself: Learning Individualized Competitive Behaviour with a Contrastive Reinforcement Learning optimization [57.615269148301515]
In a competitive game scenario, a set of agents have to learn decisions that maximize their goals and minimize their adversaries' goals at the same time. We propose a novel model composed of three neural layers that learn a representation of a competitive game, learn how to map the strategy of specific opponents, and how to disrupt them. Our experiments demonstrate that our model achieves better performance when playing against offline, online, and competitive-specific models, in particular when playing against the same opponent multiple times.
arXiv Detail & Related papers (2023-10-02T08:11:07Z)
Are ChatGPT and GPT-4 Good Poker Players? -- A Pre-Flop Analysis [3.4111723103928173]
We put ChatGPT and GPT-4 through the poker test and evaluate their poker skills. Our findings reveal that while both models display an advanced understanding of poker, both ChatGPT and GPT-4 are NOT game theory optimal poker players.
arXiv Detail & Related papers (2023-08-23T23:16:35Z)
PokerKit: A Comprehensive Python Library for Fine-Grained Multi-Variant Poker Game Simulations [40.39759037668144]
PokerKit is an open-source Python library designed to overcome the restrictions of existing poker game simulation and hand evaluation tools. It supports an extensive array of poker variants and provides a flexible architecture for users to define their custom games. The flexibility of PokerKit allows for applications in diverse areas, such as poker AI development, tool creation, and online poker casino implementation.
arXiv Detail & Related papers (2023-08-08T13:54:48Z)
SPRING: Studying the Paper and Reasoning to Play Games [102.5587155284795]
We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM) In experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment. Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories.
arXiv Detail & Related papers (2023-05-24T18:14:35Z)
Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games [31.97631243571394]
We introduce a framework, LMAC, that automates the discovery of the update rule without explicit human design. Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance. We show that LMAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO.
arXiv Detail & Related papers (2021-06-04T22:30:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.