Fugu-MT 論文翻訳(概要): APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents

論文の概要: APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents

arxiv url: http://arxiv.org/abs/2605.21240v1
Date: Wed, 20 May 2026 14:29:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.723887
Title: APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents
Title（参考訳）: APEX:自己進化型LLMエージェントの自律的政策探索
Authors: Yibo Li, Jiashuo Yang, Zhi Zheng, Zhiyuan Hu, Yuan Sui, Shizun Wang, Yufei He, Bryan Hooi,
Abstract要約: 自己進化型エージェントは、モデルウェイト更新を必要とせず、エピソード間でメモリとリフレクションを蓄積することによって学習する。メモリが大きくなるにつれて、行動は慣れ親しんだハイリワードルーチンに集中し、より良い選択肢を発見する機会を減らす。戦略マップを通じて明確な戦略空間を構築し維持する自律的政策展開(APEX)を提案する。
参考スコア（独自算出の注目度）: 54.213455157510445
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLM agents have shown strong performance across a wide range of complex tasks, including interactive environments that require long-horizon decision making. But these agents cannot learn on the fly at test time. Self-evolving agents address this by accumulating memory and reflection across episodes rather than requiring model-weight updates. However, these agents often suffer from exploration collapse: as memory grows, behavior concentrates around familiar high-reward routines, reducing the chance of discovering better alternatives. To address this problem, we propose Autonomous Policy EXploration (APEX), which builds and maintains an explicit strategy space through a strategy map-a directed acyclic graph of milestones with prerequisite dependency edges. In APEX, Fork Discovery expands the map with evidence-grounded unexplored directions, while Policy Selection balances exploration and exploitation during planning. Evaluated on nine Jericho text-adventure games and WebArena, a realistic web interaction benchmark, APEX outperforms all baselines. Extensive ablations validate each component's contribution and demonstrate robustness across diverse settings, demonstrating APEX's effectiveness for sustained exploration in self-evolving agents.
Abstract（参考訳）: LLMエージェントは、長期の意思決定を必要とする対話型環境を含む、幅広い複雑なタスクにおいて、強いパフォーマンスを示してきた。しかし、これらのエージェントはテスト時にその場で学ぶことはできない。自己進化エージェントは、モデルウェイト更新を必要とせず、エピソード間でメモリとリフレクションを蓄積することで、この問題に対処する。しかし、これらのエージェントは探索の崩壊に悩まされることが多く、記憶が成長するにつれて、行動は慣れ親しんだハイリワードルーチンに集中し、より良い代替手段を発見する可能性を減らす。この問題に対処するため,我々は,予め必要となる依存関係エッジを持つマイルストーンの有向非循環グラフを用いて,明確な戦略空間を構築し,維持する自律的ポリシエクスプロレーション(APEX)を提案する。 APEXでは、フォークディスカバリー(Fork Discovery)がエビデンスに基づく未探索の方向で地図を拡張し、ポリシー選択は計画中の探索と搾取のバランスを取る。 9つのJerrichoテキストアドベンチャーゲームと、現実的なWebインタラクションベンチマークであるWebArenaで評価されたAPEXは、すべてのベースラインを上回っている。広範囲にわたるアブレーションは、各コンポーネントの貢献を検証し、多様な設定にまたがる堅牢さを示し、APEXが自己進化剤の持続的な探索に有効であることを証明している。

論文の概要: APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents

関連論文リスト