Fugu-MT 論文翻訳(概要): Internalizing Agency from Reflective Experience

論文の概要: Internalizing Agency from Reflective Experience

arxiv url: http://arxiv.org/abs/2603.16843v1
Date: Tue, 17 Mar 2026 17:50:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:07.461021
Title: Internalizing Agency from Reflective Experience
Title（参考訳）: 反射体験からの内在化機関
Authors: Rui Ge, Yichao Fu, Yuyang Qian, Junda Su, Yiming Zhao, Peng Zhao, Hao Zhang,
Abstract要約: LEAFEは、リカバリエージェンシーをリフレクティブエクスペリエンスから内部化するフレームワークである。ベースモデルよりも一貫してPass@1を改善し、結果駆動ベースラインよりも高いPass@kを実現している。
参考スコア（独自算出の注目度）: 20.650609947690196
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models are increasingly deployed as autonomous agents that must plan, act, and recover from mistakes through long-horizon interaction with environments that provide rich feedback. However, prevailing outcome-driven post-training methods (e.g., RL with verifiable rewards) primarily optimize final success signals, leaving rich environment feedback underutilized. Consequently, they often lead to distribution sharpening: the policy becomes better at reproducing a narrow set of already-successful behaviors, while failing to improve the feedback-grounded agency needed to expand problem-solving capacity (e.g., Pass@k) in long-horizon settings. To address this, we propose LEAFE (Learning Feedback-Grounded Agency from Reflective Experience), a framework that internalizes recovery agency from reflective experience. Specifically, during exploration, the agent summarizes environment feedback into actionable experience, backtracks to earlier decision points, and explores alternative branches with revised actions. We then distill these experience-guided corrections into the model through supervised fine-tuning, enabling the policy to recover more effectively in future interactions. Across a diverse set of interactive coding and agentic tasks under fixed interaction budgets, LEAFE consistently improves Pass@1 over the base model and achieves higher Pass@k than outcome-driven baselines (GRPO) and experience-based methods such as Early Experience, with gains of up to 14% on Pass@128.
Abstract（参考訳）: 大規模な言語モデルは、豊かなフィードバックを提供する環境との長期的な相互作用を通じて、失敗を計画し、行動し、回復しなければならない自律的なエージェントとして、ますます多くデプロイされている。しかし、一般的な結果駆動のポストトレーニング手法(例えば、検証可能な報酬を持つRL)は、主に最終成功信号を最適化し、豊かな環境フィードバックを未利用のまま残している。その結果、しばしば流通の激化につながる:このポリシーは、既に必要とされていた行動の狭いセットを再現するのがより良くなる一方で、長期にわたる設定で問題解決能力(例えば、Pass@k)を拡大するために必要なフィードバックベースエージェンシーの改善に失敗する。そこで我々は,リカバリーエージェンシーをリフレクティブ体験から内包するフレームワークであるLEAFE(Learning Feedback-Grounded Agency from Reflective Experience)を提案する。具体的には、調査中、エージェントは環境フィードバックを行動可能な体験にまとめ、以前の意思決定ポイントへのバックトラックを作成し、修正されたアクションで代替ブランチを探索する。次に、これらの経験誘導補正を教師付き微調整によりモデルに蒸留し、将来のインタラクションにおいてより効果的にポリシーを回復する。 LEAFEは、固定されたインタラクション予算の下での多様なインタラクティブコーディングとエージェントタスクのセットの中で、Pass@1を一貫して改善し、結果駆動ベースライン(GRPO)やEarly Experienceのようなエクスペリエンスベースのメソッドよりも高いPass@kを実現し、Pass@128で最大14%向上した。

論文の概要: Internalizing Agency from Reflective Experience

関連論文リスト