Fugu-MT 論文翻訳(概要): RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

論文の概要: RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

arxiv url: http://arxiv.org/abs/2603.08561v2
Date: Wed, 11 Mar 2026 12:33:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 14:12:44.162606
Title: RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback
Title（参考訳）: RetroAgent: Retrospective Dual Intrinsic Feedbackによる解決から進化へ
Authors: Xiaoying Zhang, Zichen Liu, Yipeng Zhang, Xia Hu, Wenqi Shao,
Abstract要約: 強化学習(RL)で訓練されたLarge Language Model (LLM)ベースのエージェントは、複雑な対話的タスクに強い可能性を示している。我々はRetroAgentを紹介します。RetroAgentは、エージェントが問題解決だけでなく、進化によって複雑な対話環境をマスターすることを可能にするオンラインRLフレームワークです。
参考スコア（独自算出の注目度）: 54.39884046754265
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM)-based agents trained with reinforcement learning (RL) have shown strong potential on complex interactive tasks. However, standard RL paradigms favor static problem-solving over continuous adaptation: agents often converge to suboptimal strategies due to insufficient exploration, while learned knowledge remains implicit within parameters rather than explicitly retrievable, limiting effective experiential learning. To address these limitations, we introduce RetroAgent, an online RL framework that empowers agents to master complex interactive environments not just by solving, but by evolving. Concretely, RetroAgent features a hindsight self-reflection mechanism that produces dual intrinsic feedback: (1) intrinsic numerical feedback that that tracks incremental subtask completion relative to prior attempts, rewarding promising explorations, and (2) intrinsic language feedback that distills reusable lessons into a memory buffer, retrieved via our proposed Similarity & Utility-Aware Upper Confidence Bound (SimUtil-UCB) strategy balancing relevance, utility, and exploration to effectively leverage past experiences. Extensive experiments on two model families across four challenging agentic tasks demonstrate that RetroAgent significantly outperforms existing methods, achieving state-of-the-art results -- e.g., surpassing Group Relative Policy Optimization (GRPO)-trained agents by +18.3% on ALFWorld, +15.4% on WebShop, +27.1% on Sokoban, and +8.9% on MineSweeper -- while exhibiting strong test-time adaptation and generalization to out-of-distribution scenarios.
Abstract（参考訳）: 強化学習(RL)で訓練されたLarge Language Model (LLM)ベースのエージェントは、複雑な対話的タスクに強い可能性を示している。しかし、標準的なRLパラダイムは、継続的な適応よりも静的な問題解決を好んでおり、エージェントは探索が不十分なため最適以下の戦略に収束することが多い。この制限に対処するために、RetroAgentというオンラインのRLフレームワークを紹介します。具体的には、RetroAgentは2つの内在的フィードバックを生成する後向きの自己回帰機構を備えており、(1) 従来の試みと比較してインクリメンタルなサブタスク完了を追跡する内在的な数値フィードバック、(2) 再利用可能なレッスンをメモリバッファに蒸留する内在的な言語フィードバックは、我々の提案したSimisity & Utility-Aware upper Confidence Bound (SimUtil-UCB) 戦略と妥当性、有用性、そして過去の経験を効果的に活用するための探索である。 4つの挑戦的なエージェントタスクにわたる2つのモデルファミリーに関する大規模な実験は、RetroAgentが既存のメソッドを大幅に上回り、最先端の結果 -- 例えば、グループ相対ポリシー最適化(GRPO)のトレーニングされたエージェントを+18.3%上回り、WebShopが+15.4%、Sokobanが+27.1%、MineSweeperが+8.9%上回る。

論文の概要: RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

関連論文リスト