Fugu-MT 論文翻訳(概要): Don't Make the LLM Read the Graph: Make the Graph Think

論文の概要: Don't Make the LLM Read the Graph: Make the Graph Think

arxiv url: http://arxiv.org/abs/2604.23057v1
Date: Fri, 24 Apr 2026 22:56:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.126972
Title: Don't Make the LLM Read the Graph: Make the Graph Think
Title（参考訳）: LLMにグラフを読み込ませる - グラフを思い起こさせる
Authors: Yuqi Sun, Tianqin Meng, George Liu, Yashraj Panwar, Lakshya Chaudhry, Munasib Ilham, Aman Chadha,
Abstract要約: 協調型マルチエージェントカードゲーム「はなび」において,明示的な信念グラフがLLM性能を向上させるか否かを検討する。 3000以上の対照試験により,4つの結果が得られた。
参考スコア（独自算出の注目度）: 13.277447097714395
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate whether explicit belief graphs improve LLM performance in cooperative multi-agent reasoning. Through 3,000+ controlled trials across four LLM families in the cooperative card game Hanabi, we establish four findings. First, integration architecture determines whether belief graphs provide value: as prompt context, graphs are decorative for strong models and beneficial only for weak models on 2nd-order Theory of Mind (80% vs 10%, p<0.0001, OR=36.0); when graphs gate action selection through ranked shortlists, they become structurally essential even for strong models (100% vs 20% on 2nd-order ToM, p<0.001). Second, we identify "Planner Defiance," a model-family-specific failure where LLMs override correct planner recommendations at partial competence (90% override, replicated N=20); Gemini models show near-zero defiance while Llama 70B shows 90%, and models distinguish factual context (deferred to) from advisory recommendations (overridden). Third, full-game evidence confirms inter-agent conventions (+128% over baseline, p=0.003) outperform all single-agent interventions, and individual belief-graph components must be combined to produce gains. Fourth, preliminary scaling analysis (N=10/cell, exploratory) suggests graph depth has diminishing returns: shallow graphs provide the best cost-benefit ratio, while deeper ToM graphs appear harmful at larger player counts (-1.5 pts at 5-player, p=0.029).
Abstract（参考訳）: 協調型マルチエージェント推論において,明示的な信念グラフがLLM性能を向上させるか否かを検討する。協力型カードゲーム「はなび」の4家族を対象とした3000以上の治験により, 4つの知見が得られた。まず、統合アーキテクチャは、信念グラフが価値を提供するかどうかを決定する: 迅速な文脈として、グラフは強いモデルに対して装飾的であり、2階の心の理論上の弱いモデル(80%対10%、p<0.0001、OR=36.0)にのみ有益である。次に,Llama 70Bが90%,Llama 70Bが90%,Llama 70Bが90%,LLMが90%の精度で正しいプランナーレコメンデーションをオーバーライドするモデルファミリー固有の障害であるPlanner Defianceを同定した。第3に、ゲーム間慣行(+128%以上、p=0.003以上)は全ての単一エージェントの介入より優れており、個々の信念グラフコンポーネントは利得を得るために組み合わせなければならない。第4に、予備的なスケーリング分析 (N=10/cell, Exploratory) では、グラフの深さはリターンを減少させ、浅いグラフは最高のコスト-ベネフィット比を提供し、深いToMグラフはより大きなプレイヤー数では有害である(5-playerでは-1.5 pts、p=0.029)。

論文の概要: Don't Make the LLM Read the Graph: Make the Graph Think

関連論文リスト