Fugu-MT 論文翻訳(概要): The Social Laboratory: A Psychometric Framework for Multi-Agent LLM Evaluation

論文の概要: The Social Laboratory: A Psychometric Framework for Multi-Agent LLM Evaluation

arxiv url: http://arxiv.org/abs/2510.01295v1
Date: Wed, 01 Oct 2025 07:10:28 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 16:59:20.799036
Title: The Social Laboratory: A Psychometric Framework for Multi-Agent LLM Evaluation
Title（参考訳）: 社会実験室:多エージェントLCM評価のための心理学的枠組み
Authors: Zarreen Reza,
Abstract要約: マルチエージェント討論を制御された「社会実験室」として活用する新しい評価枠組みを導入する。特に認知活動において、割り当てられたペルソナが安定した、測定可能な心理測定プロファイルを誘導することを示す。この研究は、動的、心理学的に基礎付けられた評価プロトコルの新しいクラスの青写真を提供する。
参考スコア（独自算出の注目度）: 0.16921396880325779
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As Large Language Models (LLMs) transition from static tools to autonomous agents, traditional evaluation benchmarks that measure performance on downstream tasks are becoming insufficient. These methods fail to capture the emergent social and cognitive dynamics that arise when agents communicate, persuade, and collaborate in interactive environments. To address this gap, we introduce a novel evaluation framework that uses multi-agent debate as a controlled "social laboratory" to discover and quantify these behaviors. In our framework, LLM-based agents, instantiated with distinct personas and incentives, deliberate on a wide range of challenging topics under the supervision of an LLM moderator. Our analysis, enabled by a new suite of psychometric and semantic metrics, reveals several key findings. Across hundreds of debates, we uncover a powerful and robust emergent tendency for agents to seek consensus, consistently reaching high semantic agreement ({\mu} > 0.88) even without explicit instruction and across sensitive topics. We show that assigned personas induce stable, measurable psychometric profiles, particularly in cognitive effort, and that the moderators persona can significantly alter debate outcomes by structuring the environment, a key finding for external AI alignment. This work provides a blueprint for a new class of dynamic, psychometrically grounded evaluation protocols designed for the agentic setting, offering a crucial methodology for understanding and shaping the social behaviors of the next generation of AI agents. We have released the code and results at https://github.com/znreza/multi-agent-LLM-eval-for-debate.
Abstract（参考訳）: 大規模言語モデル(LLM)が静的ツールから自律エージェントへ移行するにつれ、下流タスクのパフォーマンスを測定する従来の評価ベンチマークは不十分になりつつある。これらの手法は、エージェントが対話的な環境でコミュニケーションし、説得し、協力する際に生じる、創発的な社会的・認知的ダイナミクスを捉えるのに失敗する。このギャップに対処するために、我々はマルチエージェントの議論を制御された「社会実験室」として利用し、これらの行動を発見し定量化する新しい評価枠組みを導入する。本フレームワークでは, LLMモデレーターの監督下で, 異なるペルソナとインセンティブを兼ね備えたLLMエージェントについて, 幅広い課題を意識的に検討した。我々の分析は、新しい心理メトリクスとセマンティックメトリクスによって実現され、いくつかの重要な発見が明らかになった。何百もの議論の中で、エージェントが合意を求め、明確な指示やセンシティブなトピックを伴わずに、常に高い意味的合意({\mu} > 0.88)に達するという、強力で堅牢な創発的な傾向が明らかになった。特に認知活動において、割り当てられたペルソナは安定した、測定可能な心理測定プロファイルを導き、モデレーターペルソナは、外部AIアライメントの鍵となる環境を構築することで、議論の結果を著しく変えることができることを示す。この研究は、エージェント設定のために設計された、動的で心理的に基礎付けられた新しい評価プロトコルの青写真を提供し、次世代のAIエージェントの社会的振る舞いを理解し形成するための重要な方法論を提供する。コードと結果はhttps://github.com/znreza/multi-agent-LLM-eval-for-debate.comで公開しました。

論文の概要: The Social Laboratory: A Psychometric Framework for Multi-Agent LLM Evaluation

関連論文リスト