Fugu-MT 論文翻訳(概要): Rethinking Continual Experience Internalization for Self-Evolving LLM Agents

論文の概要: Rethinking Continual Experience Internalization for Self-Evolving LLM Agents

arxiv url: http://arxiv.org/abs/2606.04703v1
Date: Wed, 03 Jun 2026 10:30:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-04 20:44:18.684291
Title: Rethinking Continual Experience Internalization for Self-Evolving LLM Agents
Title（参考訳）: 自己進化型LDM剤の連続的内部化再考
Authors: Jingwen Chen, Wenkai Yang, Shengda Fan, Wenbo Nie, Chenxing Sun, Shaodong Zheng, Yangen Hu, Lu Pan, Ke Zeng, Yankai Lin,
Abstract要約: 原則レベルのエクスペリエンスは、インスタンスレベルのエクスペリエンスよりも耐久性が高いことが分かりました。ステップワイドインジェクションは、中間的決定状態と経験を整合させることで、グローバルインジェクションを著しく上回る。高品質な教師軌道上のオフ・ポリティカル・コンテクスト蒸留は、オン・ポリティカル・コンテクスト蒸留よりもかなり安定した訓練信号を提供する。
参考スコア（独自算出の注目度）: 36.80404778289742
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Experience internalization converts contextual experience from past interactions into reusable parametric capability, offering a promising path toward continual learning in large language models (LLMs). While prior work has predominantly focused on single-iteration transfer, we discover that under multi-iteration experience learning, existing methods suffer from a progressive capability collapse rather than compounding improvement. We systematically examine this failure through three vital dimensions of experience internalization: (1) Experience Granularity: We find that principle-level experience is more durable than instance-level experience, as it effectively abstracts transferable strategies away from trajectory-specific details. (2) Experience Injection Pattern: Our analysis reveals that step-wise injection significantly outperforms global injection by aligning experience with intermediate decision states, a property that is critical for long-horizon tool use. (3) Internalization Regime: We demonstrate that off-policy context-distillation on high-quality teacher trajectories provides a substantially more stable training signal than on-policy context-distillation, which is inherently limited by local corrections on student-induced flawed states. Together, these insights yield a simple yet robust recipe for stable and sustainable experience internalization, providing concrete guidance for engineering self-evolving and continually learning LLMs.
Abstract（参考訳）: 経験的内在化は、過去のインタラクションから再利用可能なパラメトリック能力へとコンテキスト体験を変換し、大規模言語モデル(LLM)における継続的な学習への有望な道を提供する。先行研究はシングルイテレーション移行に重点を置いてきたが、マルチイテレーション経験学習では、既存の手法は改善を複雑にするのではなく、進歩的な能力崩壊に悩まされていることが判明した。 1) 経験粒度: 原則レベルのエクスペリエンスは、トランジェクトリ固有の詳細から効果的に移行可能な戦略を抽象化するので、インスタンスレベルのエクスペリエンスよりも耐久性が高いことが分かりました。 2) 経験的注入パターン: この分析により, 長期ツールの使用に欠かせない特性である中間的決定状態と経験を整合させることにより, 段階的注入がグローバルインジェクションを著しく上回っていることが明らかとなった。内化規則: 質の高い教員軌道上の非政治的文脈蒸留は、本来、学生が引き起こした欠陥状態の局所的補正によって制限される、非政治的文脈蒸留よりも、かなり安定した訓練信号を提供することを示した。これらの洞察を合わせて、安定的で持続可能な内部化を実現するためのシンプルで堅牢なレシピが得られ、エンジニアリングの自己進化と継続的な学習のための具体的なガイダンスを提供する。

論文の概要: Rethinking Continual Experience Internalization for Self-Evolving LLM Agents

関連論文リスト