Fugu-MT 論文翻訳(概要): Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models

論文の概要: Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models

arxiv url: http://arxiv.org/abs/2603.10098v1
Date: Tue, 10 Mar 2026 17:37:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 16:22:32.635226
Title: Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models
Title（参考訳）: コードスペース対応 Oracle: 大規模言語モデルによる解釈可能なマルチエージェントポリシの生成
Authors: Daniel Hennes, Zun Li, John Schultz, Marc Lanctot,
Abstract要約: 深い強化学習のオラクルは、解釈、信頼、デバッグが難しいブラックボックスのニューラルネットワークポリシを生成します。我々は、この課題に対処する新しいフレームワークであるCode-Space Response Oracles (CSRO)を紹介します。本研究は,不透明なポリシーパラメータの最適化から解釈可能なアルゴリズム行動へ焦点を移す,マルチエージェント学習の新しい視点を示す。
参考スコア（独自算出の注目度）: 8.649235365712004
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in multi-agent reinforcement learning, particularly Policy-Space Response Oracles (PSRO), have enabled the computation of approximate game-theoretic equilibria in increasingly complex domains. However, these methods rely on deep reinforcement learning oracles that produce `black-box' neural network policies, making them difficult to interpret, trust or debug. We introduce Code-Space Response Oracles (CSRO), a novel framework that addresses this challenge by replacing RL oracles with Large Language Models (LLMs). CSRO reframes the best response computation as a code generation task, prompting an LLM to generate policies directly as human-readable code. This approach not only yields inherently interpretable policies but also leverages the LLM's pretrained knowledge to discover complex, human-like strategies. We explore multiple ways to construct and enhance an LLM-based oracle: zero-shot prompting, iterative refinement and \emph{AlphaEvolve}, a distributed LLM-based evolutionary system. We demonstrate that CSRO achieves performance competitive with baselines while producing a diverse set of explainable policies. Our work presents a new perspective on multi-agent learning, shifting the focus from optimizing opaque policy parameters to synthesizing interpretable algorithmic behavior.
Abstract（参考訳）: マルチエージェント強化学習の最近の進歩、特に政策空間応答オラクル(PSRO)は、ますます複雑な領域における近似ゲーム理論平衡の計算を可能にしている。しかし、これらの手法は'ブラックボックス'ニューラルネットワークポリシーを生成する深層強化学習のオラクルに依存しており、解釈、信頼、デバッグが困難である。我々は、LLオーラクルをLLM(Large Language Models)に置き換えることで、この問題に対処する新しいフレームワークであるCode-Space Response Oracles (CSRO)を紹介する。 CSROはコード生成タスクとして最高のレスポンス計算を再構成し、LCMはヒューマン可読コードとしてポリシーを直接生成する。このアプローチは、本質的に解釈可能なポリシーを得るだけでなく、LLMの事前訓練された知識を活用して、複雑で人間的な戦略を発見する。ゼロショットプロンプト,反復精製,および分散LLMベースの進化システムである \emph{AlphaEvolve} の構築と拡張について検討する。 CSROは、多種多様な説明可能なポリシーを作成しながら、ベースラインと競合する性能を実現することを実証する。本研究は,不透明なポリシーパラメータの最適化から解釈可能なアルゴリズム行動の合成へと焦点を移す,マルチエージェント学習の新しい視点を示す。

論文の概要: Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models

関連論文リスト