Fugu-MT 論文翻訳(概要): The World Won't Stay Still: Programmable Evolution for Agent Benchmarks

論文の概要: The World Won't Stay Still: Programmable Evolution for Agent Benchmarks

arxiv url: http://arxiv.org/abs/2603.05910v1
Date: Fri, 06 Mar 2026 04:56:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 13:17:45.096054
Title: The World Won't Stay Still: Programmable Evolution for Agent Benchmarks
Title（参考訳）: エージェントベンチマークのプログラム可能な進化は、まだ止まらない
Authors: Guangrui Li, Yaochen Xie, Yi Liu, Ziwei Dong, Xingyuan Pan, Tianqi Zheng, Jason Choi, Michael J. Morais, Binit Jha, Shaunak Mishra, Bingrou Zhou, Chen Luo, Monica Xiao Cheng, Dawn Song,
Abstract要約: LLMベースのエージェントは、環境とのインタラクション、データクエリ、マルチターンプロセスでのツールの呼び出しによって、ユーザの要求を満たす。既存のベンチマークのほとんどは、固定されたスキーマとツールセットを持つ静的環境を前提としており、現実世界の進化的な性質を無視し、エージェントの環境変化に対する堅牢性を前提としている。本稿では,環境進化をプログラム可能なグラフベースのフレームワークProEvolveを提案する。
参考スコア（独自算出の注目度）: 44.36372545284675
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLM-powered agents fulfill user requests by interacting with environments, querying data, and invoking tools in a multi-turn process. Yet, most existing benchmarks assume static environments with fixed schemas and toolsets, neglecting the evolutionary nature of the real world and agents' robustness to environmental changes. In this paper, we study a crucial problem: how to evolve the agent environment in a scalable and controllable way, thereby better evaluating agents' adaptability to real-world dynamics. We propose ProEvolve, a graph-based framework that makes environment evolution programmable. At its core, a typed relational graph provides a unified, explicit representation of the environment: data, tools, and schema. Under this formalism, adding, removing, or modifying capabilities are expressed as graph transformations that coherently propagate updates across tools, schemas, and data access. Building on this, ProEvolve can (1) program the evolutionary dynamics as graph transformations to generate environments automatically, and (2) instantiate task sandboxes via subgraph sampling and programming. We validate ProEvolve by evolving a single environment into 200 environments and 3,000 task sandboxes, and benchmark representative agents accordingly.
Abstract（参考訳）: LLMベースのエージェントは、環境とのインタラクション、データクエリ、マルチターンプロセスでのツールの呼び出しによって、ユーザの要求を満たす。しかし、既存のベンチマークのほとんどは、固定されたスキーマとツールセットを持つ静的環境を前提としており、現実世界の進化的な性質を無視し、エージェントの環境変化に対する堅牢さを前提としている。本稿では,エージェント環境をスケーラブルかつ制御可能な方法でどのように進化させるかという重要な課題について検討する。本稿では,環境進化をプログラム可能なグラフベースのフレームワークProEvolveを提案する。その中核にある型付きリレーショナルグラフは、データ、ツール、スキーマといった環境の統一的で明示的な表現を提供する。このフォーマリズムの下では、機能の追加、削除、あるいは変更は、ツール、スキーマ、データアクセス間のアップデートを一貫性を持って伝達するグラフ変換として表現される。これに基づいて、ProEvolveは(1)グラフ変換として進化力学をプログラムして環境を自動的に生成し、(2)サブグラフサンプリングとプログラミングによってタスクサンドボックスをインスタンス化する。一つの環境を200の環境と3000のタスクサンドボックスに進化させることでProEvolveを検証する。

論文の概要: The World Won't Stay Still: Programmable Evolution for Agent Benchmarks

関連論文リスト