Fugu-MT 論文翻訳(概要): Beyond the high score: Prosocial ability profiles of multi-agent populations

論文の概要: Beyond the high score: Prosocial ability profiles of multi-agent populations

arxiv url: http://arxiv.org/abs/2509.14485v1
Date: Wed, 17 Sep 2025 23:29:39 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-19 17:26:53.003374
Title: Beyond the high score: Prosocial ability profiles of multi-agent populations
Title（参考訳）: 高得点を超えて:マルチエージェント人口の社会的能力プロファイル
Authors: Marko Tesic, Yue Zhao, Joel Z. Leibo, Rakshit S. Trivedi, Jose Hernandez-Orallo,
Abstract要約: Melting Potコンテストは、AIシステムの協調能力を評価するために設計された、ソーシャルAI評価スイートである。我々は,メルティングポットコンテストにおけるマルチエージェントシステムの能力プロファイルを推定するために,測定レイアウトとして知られるベイズ的手法を適用した。これらの能力プロファイルは,メルティングポットスイート内での今後のパフォーマンスを予測するだけでなく,エージェントの社会的能力の基盤を明らかにする。
参考スコア（独自算出の注目度）: 7.740015167057365
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The development and evaluation of social capabilities in AI agents require complex environments where competitive and cooperative behaviours naturally emerge. While game-theoretic properties can explain why certain teams or agent populations outperform others, more abstract behaviours, such as convention following, are harder to control in training and evaluation settings. The Melting Pot contest is a social AI evaluation suite designed to assess the cooperation capabilities of AI systems. In this paper, we apply a Bayesian approach known as Measurement Layouts to infer the capability profiles of multi-agent systems in the Melting Pot contest. We show that these capability profiles not only predict future performance within the Melting Pot suite but also reveal the underlying prosocial abilities of agents. Our analysis indicates that while higher prosocial capabilities sometimes correlate with better performance, this is not a universal trend-some lower-scoring agents exhibit stronger cooperation abilities. Furthermore, we find that top-performing contest submissions are more likely to achieve high scores in scenarios where prosocial capabilities are not required. These findings, together with reports that the contest winner used a hard-coded solution tailored to specific environments, suggest that at least one top-performing team may have optimised for conditions where cooperation was not necessary, potentially exploiting limitations in the evaluation framework. We provide recommendations for improving the annotation of cooperation demands and propose future research directions to account for biases introduced by different testing environments. Our results demonstrate that Measurement Layouts offer both strong predictive accuracy and actionable insights, contributing to a more transparent and generalisable approach to evaluating AI systems in complex social settings.
Abstract（参考訳）: AIエージェントにおける社会的能力の開発と評価は、競争的かつ協調的な行動が自然に現れる複雑な環境を必要とする。ゲーム理論の特性は、特定のチームやエージェントの集団が他より優れている理由を説明することができるが、慣例のようなより抽象的な行動は、トレーニングや評価設定において制御するのが困難である。 Melting Potコンテストは、AIシステムの協調能力を評価するために設計された、ソーシャルAI評価スイートである。本稿では,メルティングポットコンテストにおけるマルチエージェントシステムの能力プロファイルを推定するために,測定レイアウトとして知られるベイズ的手法を適用する。これらの能力プロファイルは,メルティングポットスイート内での今後のパフォーマンスを予測するだけでなく,エージェントの社会的能力の基盤を明らかにする。分析の結果,高い社会的能力はより良いパフォーマンスと相関することがあるが,これは普遍的な傾向ではなく,より強い協調能力を示すことが示唆された。さらに, 社会的能力を必要としないシナリオでは, 成績の高いコンテストの応募者の方が高いスコアを得る可能性が示唆された。これらの結果は、コンテストの勝者が特定の環境に合わせてハードコードされたソリューションを使用したという報告とともに、少なくとも1つのトップパフォーマンスチームは、協力が不要な状況に最適化し、評価フレームワークの制限を悪用した可能性があることを示唆している。我々は、協力要求のアノテーションを改善するための勧告と、異なるテスト環境によって導入されたバイアスを考慮に入れた今後の研究方向性を提案する。我々の結果は、測定レイアウトが強力な予測精度と行動可能な洞察を提供し、複雑な社会環境でAIシステムを評価するためのより透明で一般化可能なアプローチに寄与していることを実証している。

論文の概要: Beyond the high score: Prosocial ability profiles of multi-agent populations

関連論文リスト