Fugu-MT 論文翻訳(概要): Dynamic Dual-Granularity Skill Bank for Agentic RL

論文の概要: Dynamic Dual-Granularity Skill Bank for Agentic RL

arxiv url: http://arxiv.org/abs/2603.28716v1
Date: Mon, 30 Mar 2026 17:32:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:45.540522
Title: Dynamic Dual-Granularity Skill Bank for Agentic RL
Title（参考訳）: エージェントRL用動的デュアルグラニュラリティスキルバンク
Authors: Songjun Tu, Chengdong Xu, Qichao Zhang, Yaocheng Zhang, Xiangyuan Lan, Linjing Li, Dongbin Zhao,
Abstract要約: D2Skillはエージェント強化学習のための動的二重粒度スキルバンクである。再利用可能な経験をタスクスキルに整理し、高いレベルのガイダンスとステップスキルを使って、きめ細かい意思決定支援とエラー修正を行う。
参考スコア（独自算出の注目度）: 34.161117844675324
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Agentic reinforcement learning (RL) can benefit substantially from reusable experience, yet existing skill-based methods mainly extract trajectory-level guidance and often lack principled mechanisms for maintaining an evolving skill memory. We propose D2Skill, a dynamic dual-granularity skill bank for agentic RL that organizes reusable experience into task skills for high-level guidance and step skills for fine-grained decision support and error correction. D2Skill jointly trains the policy and skill bank through paired baseline and skill-injected rollouts under the same policy, using their performance gap to derive hindsight utility signals for both skill updating and policy optimization. Built entirely from training-time experience, the skill bank is continuously expanded through reflection and maintained with utility-aware retrieval and pruning. Experiments on ALFWorld and WebShop with Qwen2.5-7B-Instruct and Qwen3-4B-Instruct-2507 show that D2Skill consistently improves success rates over skill-free baselines by 10-20 points. Further ablations and analyses show that both dual-granularity skill modeling and dynamic skill maintenance are critical to these gains, while the learned skills exhibit higher utility, transfer across evaluation settings, and introduce only modest training overhead.
Abstract（参考訳）: エージェント強化学習(RL)は、再利用可能な経験からかなりの恩恵を受けるが、既存のスキルベース手法は主に軌道レベルのガイダンスを抽出し、しばしば進化するスキルメモリを維持するための原則的なメカニズムを欠いている。本稿では,エージェントRLのための動的二重粒度スキルバンクであるD2Skillを提案する。 D2Skillは、同じ方針の下で2つのベースラインとスキル注入されたロールアウトを通じて、ポリシーとスキルバンクを共同で訓練する。スキルバンクはリフレクションを通じて継続的に拡張され、ユーティリティ対応の検索とプルーニングによって維持される。 ALFWorldとWebShopのQwen2.5-7B-InstructとQwen3-4B-Instruct-2507による実験では、D2Skillはスキルのないベースラインよりも10～20ポイントの成功率を一貫して改善している。さらに、二重粒度スキルモデリングと動的スキルメンテナンスの両方がこれらの向上に不可欠である一方で、学習スキルは高い実用性を示し、評価設定をまたいで転送し、適度なトレーニングオーバーヘッドのみを導入している。

論文の概要: Dynamic Dual-Granularity Skill Bank for Agentic RL

関連論文リスト