Fugu-MT 論文翻訳(概要): LLM-Hanabi: Evaluating Multi-Agent Gameplays with Theory-of-Mind and Rationale Inference in Imperfect Information Collaboration Game

論文の概要: LLM-Hanabi: Evaluating Multi-Agent Gameplays with Theory-of-Mind and Rationale Inference in Imperfect Information Collaboration Game

arxiv url: http://arxiv.org/abs/2510.04980v1
Date: Mon, 06 Oct 2025 16:17:24 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.977453
Title: LLM-Hanabi: Evaluating Multi-Agent Gameplays with Theory-of-Mind and Rationale Inference in Imperfect Information Collaboration Game
Title（参考訳）: LLM-ハナビ:不完全な情報コラボレーションゲームにおけるマルチエージェントゲームの評価
Authors: Fangzhou Liang, Tianshi Zheng, Chunkit Chan, Yauwai Yim, Yangqiu Song,
Abstract要約: 本研究では,協調ゲーム「はなび」を用いて合理的推論とToMを評価する新しいベンチマーク「LM-Hanabi」を紹介する。様々なモデルにおいて,ToMとゲーム内成功との間に有意な正の相関関係が認められた。我々は,一階ToMの優先順位付けが将来のモデルの協調能力を向上するための有望な方向であると結論付けている。
参考スコア（独自算出の注目度）: 47.019077016616144
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Effective multi-agent collaboration requires agents to infer the rationale behind others' actions, a capability rooted in Theory-of-Mind (ToM). While recent Large Language Models (LLMs) excel at logical inference, their ability to infer rationale in dynamic, collaborative settings remains under-explored. This study introduces LLM-Hanabi, a novel benchmark that uses the cooperative game Hanabi to evaluate the rationale inference and ToM of LLMs. Our framework features an automated evaluation system that measures both game performance and ToM proficiency. Across a range of models, we find a significant positive correlation between ToM and in-game success. Notably, first-order ToM (interpreting others' intent) correlates more strongly with performance than second-order ToM (predicting others' interpretations). These findings highlight that for effective AI collaboration, the ability to accurately interpret a partner's rationale is more critical than higher-order reasoning. We conclude that prioritizing first-order ToM is a promising direction for enhancing the collaborative capabilities of future models.
Abstract（参考訳）: 効果的なマルチエージェントコラボレーションは、エージェントが他人の行動の背後にある理性、すなわち理論・オブ・ミンド(ToM)に根ざした能力を推論する必要がある。近年のLarge Language Models (LLMs) は論理的推論において優れているが、動的に協調的な設定で合理的に推論できる能力はいまだ探索されていない。本研究は,LLMの合理的推論とToMを評価するために,協調ゲームであるハナビを用いた新しいベンチマークであるLLM-ハナビを紹介する。本フレームワークは,ゲーム性能とToM習熟度の両方を測定する自動評価システムを備えている。様々なモデルにおいて,ToMとゲーム内成功との間に有意な正の相関関係が認められた。特に、一階ToM(他者の意図を解釈する)は、二階ToM(他者の解釈を予測する)よりもパフォーマンスに強く相関する。これらの結果は、効果的なAIコラボレーションにおいて、パートナーの理論的根拠を正確に解釈する能力は、高次の推論よりも重要であることを強調している。我々は,一階ToMの優先順位付けが将来のモデルの協調能力を向上するための有望な方向であると結論付けている。

論文の概要: LLM-Hanabi: Evaluating Multi-Agent Gameplays with Theory-of-Mind and Rationale Inference in Imperfect Information Collaboration Game

関連論文リスト