Related papers: Game of Thought: Robust Information Seeking with Large Language Models Using Game Theory

Game of Thought: Robust Information Seeking with Large Language Models Using Game Theory

URL: http://arxiv.org/abs/2602.01708v1
Date: Mon, 02 Feb 2026 06:33:18 GMT
Title: Game of Thought: Robust Information Seeking with Large Language Models Using Game Theory
Authors: Langyuan Cui, Chun Kai Ling, Hwee Tou Ng,
Abstract summary: We use the game of Twenty Questions to evaluate the information-seeking ability of Large Language Models (LLMs)<n>We propose Game of Thought (GoT), a framework that applies game-theoretic techniques to approximate a Nash equilibrium (NE) strategy for the restricted variant of the game.
Score: 37.51238507036326
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are increasingly deployed in real-world scenarios where they may lack sufficient information to complete a given task. In such settings, the ability to actively seek out missing information becomes a critical capability. Existing approaches to enhancing this ability often rely on simplifying assumptions that degrade \textit{worst-case} performance. This is an issue with serious implications in high-stakes applications. In this work, we use the game of Twenty Questions to evaluate the information-seeking ability of LLMs. We introduce and formalize its adversarial counterpart, the Strategic Language Search (SLS) problem along with its variants as a two-player zero-sum extensive form game. We propose Game of Thought (GoT), a framework that applies game-theoretic techniques to approximate a Nash equilibrium (NE) strategy for the restricted variant of the game. Empirical results demonstrate that our approach consistently improves worst-case performance compared to (1) direct prompting-based methods and (2) heuristic-guided search methods across all tested settings.

Related papers

Beyond Survival: Evaluating LLMs in Social Deduction Games with Human-Aligned Strategies [54.08697738311866]
Social deduction games like Werewolf combine language, reasoning, and strategy.<n>We curate a high-quality, human-verified multimodal Werewolf dataset containing over 100 hours of video, 32.4M utterance tokens, and 15 rule variants.<n>We propose a novel strategy-alignment evaluation that leverages the winning faction's strategies as ground truth in two stages.
arXiv Detail & Related papers (2025-10-13T13:33:30Z)
Look-ahead Reasoning with a Learned Model in Imperfect Information Games [3.4935179780034242]
This paper introduces an algorithm that learns an abstracted model of an imperfect information game directly from the agent-environment interaction.<n>During test time, this trained model is used to perform look-ahead reasoning.<n>We empirically demonstrate that with sufficient capacity, LAMIR learns the exact underlying game structure, and with limited capacity, it still learns a valuable abstraction.
arXiv Detail & Related papers (2025-10-06T17:26:56Z)
Code World Models for General Game Playing [22.382021070682256]
We use the Large Language Models to translate natural language rules and game trajectories into a formal, executable world model represented as Python code.<n>This generated model serves as a verifiable simulation engine for high-performance planning algorithms.<n>We find that our method outperforms or matches Gemini 2.5 Pro in 9 out of the 10 considered games.
arXiv Detail & Related papers (2025-10-06T07:16:07Z)
What-If Analysis of Large Language Models: Explore the Game World Using Proactive Thinking [50.72154186522052]
Large language models (LLMs) excel at processing information reactively but lack the ability to systemically explore hypothetical futures.<n>We propose WiA-LLM, a new paradigm that equips LLMs with proactive thinking capabilities.<n>We validate WiA-LLM in Honor of Kings, a complex multiplayer game environment.
arXiv Detail & Related papers (2025-09-05T04:05:27Z)
Democratizing Diplomacy: A Harness for Evaluating Any Large Language Model on Full-Press Diplomacy [37.54766836927425]
We present the first evaluation harness that enables any out-of-the-box, local, Large Language Models (LLMs) to play full-press Diplomacy without fine-tuning or specialized training.<n>Previous work required frontier LLMs, or fine-tuning, due to the high complexity and information density of Diplomacy's game state.<n>Our harness democratizes the evaluation of strategic reasoning in LLMs by eliminating the need for fine-tuning, and it provides insights into how these capabilities emerge naturally from widely used LLMs.
arXiv Detail & Related papers (2025-08-10T21:07:08Z)
GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations [87.99872683336395]
Large Language Models (LLMs) are integrated into critical real-world applications. This paper evaluates LLMs' reasoning abilities in competitive environments. We first propose GTBench, a language-driven environment composing 10 widely recognized tasks.
arXiv Detail & Related papers (2024-02-19T18:23:36Z)
SPRING: Studying the Paper and Reasoning to Play Games [102.5587155284795]
We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM) In experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment. Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories.
arXiv Detail & Related papers (2023-05-24T18:14:35Z)
Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay. During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well. Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.