Fugu-MT 論文翻訳(概要): Evaluating Large Language Models in a Complex Hidden Role Game

論文の概要: Evaluating Large Language Models in a Complex Hidden Role Game

arxiv url: http://arxiv.org/abs/2605.22826v1
Date: Thu, 09 Apr 2026 14:02:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-15 07:09:36.439101
Title: Evaluating Large Language Models in a Complex Hidden Role Game
Title（参考訳）: 複合隠れロールゲームにおける大規模言語モデルの評価
Authors: Niklas Bauer,
Abstract要約: 大規模言語モデル(LLM)の誤認の可能性の定量化はAIの安全性にとって重要であるが、制御されていない環境では達成が難しい。本研究は,社会推論ゲーム「シークレット・ヒトラー」におけるLLMの推論,説得,および誤認能力について考察する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Quantifying the deceptive potential of Large Language Models (LLMs) is critical for AI safety, yet difficult to achieve in uncontrolled environments. This work investigates the reasoning, persuasion, and deceptive capabilities of LLMs within the social deduction game Secret Hitler. I introduce an open-source framework and novel metrics to measure performance: Role Identification Accuracy, Deception Retention Rate, and Game State Impact Rate. By benchmarking models against rule-based algorithms and human games, I identify a gap between conversational ability and strategic depth. The study also analyzes the impact of reasoning-enhancement techniques on win rates and strategic reasoning. Neither Chain-of-Thought prompting nor internal memory bring improvements in performance, with up to 23.2% worse win rates for fascist roles. While rule-based agents align with expert human voting decisions 86.7% of the time, models like Llama 3.1 70B achieve only a 59.7% accuracy. Models playing as Fascists consistently yield negative impact scores and fail to sustain deception, resulting in roughly 40% shorter games compared to humans. These findings suggest that current architectures remain ineffective at complex, multi-turn manipulation. As capabilities advance, detecting when models begin to master these deceptive behaviors is crucial. The developed framework serves as a reproducible testbed for future alignment research.
Abstract（参考訳）: 大規模言語モデル(LLM)の誤認の可能性の定量化はAIの安全性にとって重要であるが、制御されていない環境では達成が難しい。本研究は,社会推論ゲーム「シークレット・ヒトラー」におけるLLMの推論,説得,および誤認能力について考察する。私は、役割識別精度、認識保持率、ゲーム状態への影響率といった、パフォーマンスを測定するためのオープンソースのフレームワークと新しいメトリクスを紹介します。ルールベースのアルゴリズムと人間のゲームに対してモデルをベンチマークすることで、会話能力と戦略的な深さのギャップを識別する。この研究は、推論・エンハンスメント技術が勝利率と戦略的推論に与える影響も分析した。チェーン・オブ・ソートも内部記憶も改善せず、ファシスト役では最大で23.2%の勝利率となる。ルールベースのエージェントは、専門家による投票の86.7%と一致しているが、Llama 3.1 70Bのようなモデルは59.7%の精度しか達成していない。ファシストとしてプレーするモデルは、常に負のインパクトスコアを出し、騙しを抑えることができず、その結果、人間に比べて約40%短いゲームになる。これらの結果は、現在のアーキテクチャは複雑なマルチターン操作では効果がないことを示唆している。能力が向上するにつれて、モデルがこれらの偽りの振る舞いをマスターし始めることを検出することが不可欠である。開発フレームワークは、将来のアライメント研究のための再現可能なテストベッドとして機能する。

論文の概要: Evaluating Large Language Models in a Complex Hidden Role Game

関連論文リスト