Fugu-MT 論文翻訳(概要): FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

論文の概要: FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

arxiv url: http://arxiv.org/abs/2605.16233v1
Date: Fri, 15 May 2026 17:42:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-18 17:44:16.359662
Title: FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast
Title（参考訳）: FORGE: 自己進化型エージェントメモリ、人口放送によるアップデートなし
Authors: Igor Bogdanov, Chung-Horng Lung, Thomas Kunz, Jie Gao, Adrian Taylor, Marzia Zaman,
Abstract要約: FORGEは反射式内部ループをラップし、専用の反射エージェントが失敗した軌道を再利用可能な知識アーティファクトに変換する。我々は,ネットワーク防御のPOMDPであるCybORG CAGE-2を,Bライン攻撃に対する30ステップの地平線上で評価した。ゼロショットベースラインとリフレクションベースライン(分離シングルストリーム学習)の両方と比較して、FOGEはゼロショットよりも1.7-7.7$times$の平均評価リターンを改善する。
参考スコア（独自算出の注目度）: 3.774094352572544
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Can LLM agents improve decision-making through self-generated memory without gradient updates? We propose FORGE (Failure-Optimized Reflective Graduation and Evolution), a staged, population-based protocol that evolves prompt-injected natural-language memory for hierarchical ReAct agents. FORGE wraps a Reflexion-style inner loop, where a dedicated reflection agent (using the same underlying LLM, no distillation from a stronger model) converts failed trajectories into reusable knowledge artifacts: textual heuristics (Rules), few-shot demonstrations (Examples), or both (Mixed), with an outer loop that propagates the best-performing instance's memory to the population between stages and freezes converged instances via a graduation criterion. We evaluate on CybORG CAGE-2, a stochastic network-defense POMDP at a 30-step horizon against the B-line attacker, where all four tested LLM families (Gemini-2.5-Flash-Lite, Grok-4-Fast, Llama-4-Maverick, Qwen3-235B) exhibit strongly negative, heavy-tailed zero-shot rewards. Compared against both a zero-shot baseline and a Reflexion baseline (isolated single-stream learning), FORGE improves average evaluation return by 1.7-7.7$\times$ over zero-shot and by 29-72% over Reflexion in all 12 model-representation conditions, reducing major-failure rates (below $-100$) to as low as $\sim$1%. We find that (1) population broadcast is critical mechanism, with a no-graduation ablation confirming that broadcast carries the performance gains while graduation primarily saves compute; (2) Examples achieves the strongest returns for three of four models, Rules offers the best cost-reliability profile with $\sim$40% fewer tokens; and (3) weaker baseline models benefit disproportionately, suggesting FORGE may mitigate capability gaps rather than amplify strong models. All evidence is confined to CAGE-2 B-line; cross-family findings are directional evidence.
Abstract（参考訳）: LLMエージェントは、勾配更新なしで自己生成メモリによる意思決定を改善することができるか? 本稿では,階層型ReActエージェントに対して,高速に注入された自然言語メモリを進化させる,段階的,人口ベースのプロトコルであるFOGEを提案する。 FORGEはリフレクションスタイルのインナーループをラップし、専用のリフレクションエージェント(LLMを使用せず、より強力なモデルから蒸留しない)が失敗したトラジェクトリを再利用可能な知識アーティファクトに変換する: テキストヒューリスティック(Rules)、少数ショットデモ(Examples)、または両方(Mixed)。我々は,Bライン攻撃に対する30ステップの地平線上での確率的ネットワーク防御POMDPであるCybORG CAGE-2の評価を行い,試験されたLLMファミリー(Gemini-2.5-Flash-Lite,Grok-4-Fast,Llama-4-Maverick,Qwen3-235B)はすべて,強い負の重み付きゼロショット報酬を示す。ゼロショットベースラインとリフレクションベースライン(分離シングルストリーム学習)の両方と比較して、FOGEはゼロショット以上の1.7-7.7$\times$と、すべての12モデル表現条件におけるリフレクションよりも29-72%の平均評価リターンを改善し、大障害率($-100$以下)を$\sim$1%まで下げる。その結果,(1) 人口放送は重要なメカニズムであり,(1) 放送が性能向上を達成し,(2) 4つのモデルのうち3つのモデルで最強のリターンを達成している,(2) ルールは$$$\sim$40% 少ないトークンで最高のコスト-信頼性プロファイルを提供する,(3) 弱いベースラインモデルでは不釣り合いに恩恵を受け,FOGE が強力なモデルを増幅するよりも機能ギャップを軽減する可能性がある,といった結果が得られた。全ての証拠はCAGE-2 B線に限られており、クロスファミリーの発見は方向性の証拠である。

論文の概要: FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

関連論文リスト