Fugu-MT 論文翻訳(概要): How Far Are LLMs from Professional Poker Players? Revisiting Game-Theoretic Reasoning with Agentic Tool Use

論文の概要: How Far Are LLMs from Professional Poker Players? Revisiting Game-Theoretic Reasoning with Agentic Tool Use

arxiv url: http://arxiv.org/abs/2602.00528v1
Date: Sat, 31 Jan 2026 05:45:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-03 19:28:33.238508
Title: How Far Are LLMs from Professional Poker Players? Revisiting Game-Theoretic Reasoning with Agentic Tool Use
Title（参考訳）: プロのポーカープレイヤーからLLMはどこまで遠いか? : エージェントツールによるゲーム理論推論の再考
Authors: Minhua Lin, Enyan Dai, Hui Liu, Xianfeng Tang, Yuliang Yan, Zhenwei Dai, Jingying Zeng, Zhiwei Zhang, Fali Wang, Hongcheng Gao, Chen Luo, Xiang Zhang, Qi He, Suhang Wang,
Abstract要約: 大規模言語モデル(LLMs)は、ハイテイクなドメインにますます適用されています。 LLMは従来のアルゴリズムと競合しない。ツール統合推論フレームワークであるToolPokerを提案する。
参考スコア（独自算出の注目度）: 52.394999779049606
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As Large Language Models (LLMs) are increasingly applied in high-stakes domains, their ability to reason strategically under uncertainty becomes critical. Poker provides a rigorous testbed, requiring not only strong actions but also principled, game-theoretic reasoning. In this paper, we conduct a systematic study of LLMs in multiple realistic poker tasks, evaluating both gameplay outcomes and reasoning traces. Our analysis reveals LLMs fail to compete against traditional algorithms and identifies three recurring flaws: reliance on heuristics, factual misunderstandings, and a "knowing-doing" gap where actions diverge from reasoning. An initial attempt with behavior cloning and step-level reinforcement learning improves reasoning style but remains insufficient for accurate game-theoretic play. Motivated by these limitations, we propose ToolPoker, a tool-integrated reasoning framework that combines external solvers for GTO-consistent actions with more precise professional-style explanations. Experiments demonstrate that ToolPoker achieves state-of-the-art gameplay while producing reasoning traces that closely reflect game-theoretic principles.
Abstract（参考訳）: 大規模言語モデル (LLMs) がハイテイクドメインにますます適用されるにつれて、不確実性の下で戦略的に推論する能力は重要になる。ポーカーは厳格なテストベッドを提供し、強い行動だけでなく、ゲーム理論の推論も必要としている。本稿では,複数の現実的なポーカータスクにおけるLLMの体系的研究を行い,ゲームプレイの結果と推論トレースの両方を評価する。分析の結果, LLMは従来のアルゴリズムと競合せず, ヒューリスティックスへの依存, 事実的誤解, 行動が推論から分岐する「理解する」ギャップという3つの繰り返し発生する欠陥を同定した。行動クローニングとステップレベルの強化学習による最初の試みは推論スタイルを改善するが、正確なゲーム理論プレイには不十分である。これらの制約に触発されたツールPokerは,GTO整合性行動に対する外部解法と,より正確な専門的な説明を組み合わせたツール統合推論フレームワークである。実験は、ツールポーカーがゲーム理論の原則を深く反映した推論トレースを生成しながら、最先端のゲームプレイを達成することを示した。

論文の概要: How Far Are LLMs from Professional Poker Players? Revisiting Game-Theoretic Reasoning with Agentic Tool Use

関連論文リスト