Fugu-MT 論文翻訳(概要): AEL: Agent Evolving Learning for Open-Ended Environments

論文の概要: AEL: Agent Evolving Learning for Open-Ended Environments

arxiv url: http://arxiv.org/abs/2604.21725v1
Date: Thu, 23 Apr 2026 14:29:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-24 14:40:06.598234
Title: AEL: Agent Evolving Learning for Open-Ended Environments
Title（参考訳）: AEL:オープンエンド環境における学習を促進するエージェント
Authors: Wujiang Xu, Jiaojiao Han, Minghao Guo, Kai Mei, Xi Zhu, Han Zhang, Dimitris N. Metaxas,
Abstract要約: 本稿では,この障害に対処する2段階のフレームワークであるemphAgent Evolving Learning (ael)を紹介する。 ael はシャープ比 2.13$pm$0.47 を達成し、5つの自己改善法を上回ります。これは、エージェントの自己改善におけるボトルネックが、アーキテクチャの複雑さを追加するのではなく、経験の使い方を自覚していることを示している。
参考スコア（独自算出の注目度）: 43.56685432981852
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLM agents increasingly operate in open-ended environments spanning hundreds of sequential episodes, yet they remain largely stateless: each task is solved from scratch without converting past experience into better future behavior. The central obstacle is not \emph{what} to remember but \emph{how to use} what has been remembered, including which retrieval policy to apply, how to interpret prior outcomes, and when the current strategy itself must change. We introduce \emph{Agent Evolving Learning} (\ael{}), a two-timescale framework that addresses this obstacle. At the fast timescale, a Thompson Sampling bandit learns which memory retrieval policy to apply at each episode; at the slow timescale, LLM-driven reflection diagnoses failure patterns and injects causal insights into the agent's decision prompt, giving it an interpretive frame for the evidence it retrieves. On a sequential portfolio benchmark (10 sector-diverse tickers, 208 episodes, 5 random seeds), \ael{} achieves a Sharpe ratio of 2.13$\pm$0.47, outperforming five published self-improving methods and all non-LLM baselines while maintaining the lowest variance among all LLM-based approaches. A nine-variant ablation reveals a ``less is more'' pattern: memory and reflection together produce a 58\% cumulative improvement over the stateless baseline, yet every additional mechanism we test (planner evolution, per-tool selection, cold-start initialization, skill extraction, and three credit assignment methods) \emph{degrades} performance. This demonstrates that the bottleneck in agent self-improvement is \emph{self-diagnosing how to use} experience rather than adding architectural complexity. Code and data: https://github.com/WujiangXu/AEL.
Abstract（参考訳）: LLMエージェントは、数百の連続するエピソードにまたがるオープンエンド環境でますます運用されるが、それらはほとんどステートレスであり、過去の経験をよりよい将来の行動に変換することなく、各タスクはゼロから解決される。中心となる障害は、記憶すべき \emph{what} ではなく、どの検索ポリシーを適用するか、事前結果の解釈方法、現在の戦略自体が変更される必要があるときなど、記憶されているもののみである。この障害に対処する2段階のフレームワークである \emph{Agent Evolving Learning} (\ael{})を紹介する。速い時間スケールでは、トンプソンサンプリングのバンドイットが各エピソードに適用すべきメモリ検索ポリシーを学習し、遅い時間スケールでは、LSMが駆動するリフレクションが障害パターンを診断し、エージェントの決定プロンプトに因果的な洞察を注入し、それが取得する証拠の解釈枠を与える。シーケンシャルポートフォリオベンチマーク(10セクターディバースティッカー、208エピソード、5つのランダムシード)では、 \ael{} はシャープ比 2.13$\pm$0.47 を達成し、5つの自己改善法と全ての非LLMベースラインを上回り、全てのLCMベースのアプローチで最低分散を維持している。メモリとリフレクションは、ステートレスベースラインに対して58\%の累積的な改善をもたらすが、テストするすべての追加メカニズム(プランナー進化、ツールごとの選択、コールドスタート初期化、スキル抽出、および3つのクレジット割り当てメソッド)は、パフォーマンスである。これは、エージェントの自己改善におけるボトルネックが、アーキテクチャの複雑さを増すのではなく、経験の使い方を‘emph{self-diagnosing \emph{self-diagnosing}’であることを示している。コードとデータ:https://github.com/WujiangXu/AEL

論文の概要: AEL: Agent Evolving Learning for Open-Ended Environments

関連論文リスト