Fugu-MT 論文翻訳(概要): Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents

論文の概要: Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents

arxiv url: http://arxiv.org/abs/2605.10663v1
Date: Mon, 11 May 2026 14:43:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 02:24:05.574751
Title: Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents
Title（参考訳）: Evolving-RL: エージェント内での経験駆動型自己進化能力のエンドツーエンド最適化
Authors: Zhiyuan Fan, Wenwei Jin, Feng Zhang, Bin Li, Yihong Dong, Yao Hu, Jiawei Li,
Abstract要約: 自己進化エージェントは、過去の相互作用から再利用可能な経験を蒸留することで、大きな言語モデルの静的な性質を克服することを目的としている。本稿では、自己進化に必要な経験抽出と利用能力を共同で改善する効率的なアルゴリズムフレームワークであるEvolving-RLを提案する。 ALFWorldとMind2Webの実験によると、Evolving-RLはLLMが経験を抽出し再利用する能力を効果的に強化する。
参考スコア（独自算出の注目度）: 31.6974589324286
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Experience-driven self-evolving agents aim to overcome the static nature of large language models by distilling reusable experience from past interactions, thus enabling adaptation to novel tasks at deployment time. This process places substantial demands on the foundation model's capacities for abstraction, generalization, and in-context learning. However, most existing studies focus primarily on system-level design choices, such as how experience is represented and managed, neglecting the inherent capabilities of the underlying model. While some recent works have started to optimize the experience utilization stage via reinforcement learning, they still fail to treat self-evolution as a unified process to be jointly optimized. To this end, we propose Evolving-RL, an efficient algorithmic framework that jointly improves the experience extraction and utilization capabilities required for self-evolution. Specifically, we center the learning process on experience extraction and evaluation, using the two supervisory signals derived from evaluation to optimize the extractor and solver separately and thus enable their coordinated co-evolution. Experiments on ALFWorld and Mind2Web show that Evolving-RL effectively enhances LLMs' ability to extract and reuse experience, leading to strong performance gains on out-of-distribution tasks (up to 98.7% relative improvement over the GRPO baseline on ALFWorld unseen tasks and 35.8% on Mind2Web), and these gains are fully unlocked only through the coordinated co-evolution of experience extraction and utilization. Furthermore, Evolving-RL inherently functions as an experience-augmented RL algorithm. By internalizing reusable experience patterns directly into model parameters, it achieves remarkable performance gains over standard baselines on both seen and unseen tasks, even in the absence of test-time experience accumulation.
Abstract（参考訳）: 経験駆動型自己進化エージェントは,過去のインタラクションから再利用可能なエクスペリエンスを蒸留することにより,大規模言語モデルの静的性を克服し,デプロイ時に新たなタスクへの適応を可能にする。このプロセスは、抽象、一般化、文脈内学習のための基礎モデルの能力にかなりの要求を与える。しかしながら、既存のほとんどの研究は、主にシステムレベルの設計選択に焦点を当てており、例えば、経験がどのように表現され、管理され、基礎となるモデルの本質的な能力を無視している。いくつかの最近の研究は、強化学習を通じて体験利用の段階を最適化し始めているが、共同で最適化される統一プロセスとして自己進化を扱えない。そこで本稿では,自己進化に必要な経験抽出と利用能力を共同で改善する,効率的なアルゴリズムフレームワークであるEvolving-RLを提案する。具体的には、評価から導出される2つの監視信号を用いて、学習過程の中心を置き、抽出器と解器を別々に最適化し、協調的共進化を可能にする。 ALFWorld と Mind2Web の実験では、Evolving-RL は LLM が経験を抽出し再利用する能力を効果的に向上し、アウト・オブ・ディストリビューションタスク(ALFWorld の GRPO ベースラインに対して最大98.7% の改善、Mind2Web の 35.8% )の性能向上につながっており、これらの向上は経験抽出と利用の協調進化によってのみ完全に解放されている。さらに、Evolving-RLは本質的に経験増強RLアルゴリズムとして機能する。再利用可能なエクスペリエンスパターンをモデルパラメータに直接内包することで、テスト時のエクスペリエンスの蓄積がなくても、目に見えるタスクと目に見えないタスクの両方において、標準的なベースラインよりも顕著なパフォーマンス向上を実現します。

論文の概要: Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents

関連論文リスト