Fugu-MT 論文翻訳(概要): Complementary Reinforcement Learning

論文の概要: Complementary Reinforcement Learning

arxiv url: http://arxiv.org/abs/2603.17621v1
Date: Wed, 18 Mar 2026 11:38:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.669012
Title: Complementary Reinforcement Learning
Title（参考訳）: 補足強化学習
Authors: Dilxat Muhtar, Jiashun Liu, Wei Gao, Weixun Wang, Shaopan Xiong, Ju Huang, Siran Yang, Wenbo Su, Jiamang Wang, Ling Pan, Bo Zheng,
Abstract要約: 強化学習(Reinforcement Learning, RL)は、LLMベースのエージェントを訓練するための強力なパラダイムとして登場した。歴史から蒸留された経験は、静的に保存されるか、改善されたアクターと共進化することができない。神経科学における補完学習システムに着想を得て,経験抽出器と政策アクターのシームレスな共進化を実現するための補完的RLを提案する。
参考スコア（独自算出の注目度）: 31.660877399506493
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement Learning (RL) has emerged as a powerful paradigm for training LLM-based agents, yet remains limited by low sample efficiency, stemming not only from sparse outcome feedback but also from the agent's inability to leverage prior experience across episodes. While augmenting agents with historical experience offers a promising remedy, existing approaches suffer from a critical weakness: the experience distilled from history is either stored statically or fail to coevolve with the improving actor, causing a progressive misalignment between the experience and the actor's evolving capability that diminishes its utility over the course of training. Inspired by complementary learning systems in neuroscience, we present Complementary RL to achieve seamless co-evolution of an experience extractor and a policy actor within the RL optimization loop. Specifically, the actor is optimized via sparse outcome-based rewards, while the experience extractor is optimized according to whether its distilled experiences demonstrably contribute to the actor's success, thereby evolving its experience management strategy in lockstep with the actor's growing capabilities. Empirically, Complementary RL outperforms outcome-based agentic RL baselines that do not learn from experience, achieving 10% performance improvement in single-task scenarios and exhibits robust scalability in multi-task settings. These results establish Complementary RL as a paradigm for efficient experience-driven agent learning.
Abstract（参考訳）: 強化学習(Reinforcement Learning, RL)は、LLMベースのエージェントをトレーニングするための強力なパラダイムとして登場したが、サンプル効率の低さによって制限され続けており、スパースフィードバックだけでなく、エピソード全体にわたる事前経験を活用できないエージェントからも根付いている。歴史から蒸留された経験は、静的に保存されるか、改善された俳優と共進化しないかのいずれかであり、経験と訓練の過程でその実用性を低下させるアクターの進化する能力の間に進歩的な不整合を引き起こす。神経科学における補完学習システムに着想を得て, 経験抽出器とポリシーアクターのシームレスな共進化を実現するための補完的RLを提案する。具体的には、経験抽出器は、その蒸留した経験が俳優の成功に実証的に寄与するか否かに応じて最適化され、これにより俳優の成長能力と連動して経験管理戦略が進化する。実証的には、Complementary RLは、経験から学ばない結果ベースのエージェントRLベースラインを上回り、シングルタスクシナリオで10%のパフォーマンス改善を実現し、マルチタスク設定で堅牢なスケーラビリティを示す。これらの結果から,効率的な経験駆動型エージェント学習のパラダイムとしてComplementary RLが確立された。

論文の概要: Complementary Reinforcement Learning

関連論文リスト