Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4
- URL: http://arxiv.org/abs/2309.17277v3
- Date: Sat, 31 Aug 2024 11:50:41 GMT
- Title: Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4
- Authors: Jiaxian Guo, Bo Yang, Paul Yoo, Bill Yuchen Lin, Yusuke Iwasawa, Yutaka Matsuo,
- Abstract summary: GPT-4, the recent breakthrough in large language models (LLMs) trained on massive passive data, is notable for its knowledge retrieval and reasoning abilities.
This paper delves into the applicability of GPT-4's learned knowledge for imperfect information games.
We introduce Suspicion-Agent, an innovative agent that leverages GPT-4's capabilities for performing in imperfect information games.
- Score: 37.64921394844022
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unlike perfect information games, where all elements are known to every player, imperfect information games emulate the real-world complexities of decision-making under uncertain or incomplete information. GPT-4, the recent breakthrough in large language models (LLMs) trained on massive passive data, is notable for its knowledge retrieval and reasoning abilities. This paper delves into the applicability of GPT-4's learned knowledge for imperfect information games. To achieve this, we introduce \textbf{Suspicion-Agent}, an innovative agent that leverages GPT-4's capabilities for performing in imperfect information games. With proper prompt engineering to achieve different functions, Suspicion-Agent based on GPT-4 demonstrates remarkable adaptability across a range of imperfect information card games. Importantly, GPT-4 displays a strong high-order theory of mind (ToM) capacity, meaning it can understand others and intentionally impact others' behavior. Leveraging this, we design a planning strategy that enables GPT-4 to competently play against different opponents, adapting its gameplay style as needed, while requiring only the game rules and descriptions of observations as input. In the experiments, we qualitatively showcase the capabilities of Suspicion-Agent across three different imperfect information games and then quantitatively evaluate it in Leduc Hold'em. The results show that Suspicion-Agent can potentially outperform traditional algorithms designed for imperfect information games, without any specialized training or examples. In order to encourage and foster deeper insights within the community, we make our game-related data publicly available.
Related papers
- Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay [0.0]
We use games like Tic-Tac-Toe, Connect Four, and Battleship to assess strategic thinking and decision-making.
Despite their proficiency on standard benchmarks, GPT-3.5 and GPT-4's abilities to play and reason about fully observable games without pre-training is mediocre.
arXiv Detail & Related papers (2024-07-12T14:17:26Z) - Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games [56.70628673595041]
Large Language Models (LLMs) have been increasingly used in real-world settings, yet their strategic decision-making abilities remain largely unexplored.
This work investigates the performance and merits of LLMs in canonical game-theoretic two-player non-zero-sum games, Stag Hunt and Prisoner Dilemma.
Our structured evaluation of GPT-3.5, GPT-4-Turbo, GPT-4o, and Llama-3-8B shows that these models, when making decisions in these games, are affected by at least one of the following systematic biases.
arXiv Detail & Related papers (2024-07-05T12:30:02Z) - Will GPT-4 Run DOOM? [0.0]
We show that GPT-4's reasoning and planning capabilities extend to the 1993 first-person shooter Doom.
We find that GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing.
arXiv Detail & Related papers (2024-03-08T17:30:41Z) - GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition? [82.40761196684524]
This paper centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks.
We conduct extensive experiments to evaluate GPT-4's performance across images, videos, and point clouds.
Our findings show that GPT-4, enhanced with rich linguistic descriptions, significantly improves zero-shot recognition.
arXiv Detail & Related papers (2023-11-27T11:29:10Z) - Generative AI in Mafia-like Game Simulation [2.44755919161855]
The study aimed to showcase the model's potential in understanding, decision-making, and interaction during game scenarios.
The findings suggest that while GPT-4 exhibits promising advancements over earlier models, there remains potential for further development.
arXiv Detail & Related papers (2023-09-20T22:38:34Z) - DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT
Models [92.6951708781736]
This work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5.
We find that GPT models can be easily misled to generate toxic and biased outputs and leak private information.
Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps.
arXiv Detail & Related papers (2023-06-20T17:24:23Z) - Towards Reliable Misinformation Mitigation: Generalization, Uncertainty,
and GPT-4 [5.313670352036673]
We show that GPT-4 can outperform prior methods in multiple settings and languages.
We propose techniques to handle uncertainty that can detect impossible examples and strongly improve outcomes.
This research lays the groundwork for future tools that can drive real-world progress to combat misinformation.
arXiv Detail & Related papers (2023-05-24T09:10:20Z) - Sparks of Artificial General Intelligence: Early experiments with GPT-4 [66.1188263570629]
GPT-4, developed by OpenAI, was trained using an unprecedented scale of compute and data.
We demonstrate that GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more.
We believe GPT-4 could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
arXiv Detail & Related papers (2023-03-22T16:51:28Z) - PerfectDou: Dominating DouDizhu with Perfect Information Distillation [51.069043489706836]
We propose PerfectDou, a state-of-the-art DouDizhu AI system that dominates the game.
In experiments we show how and why PerfectDou beats all existing AI programs, and achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-03-30T15:37:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.