Fugu-MT 論文翻訳(概要): SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training

論文の概要: SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training

arxiv url: http://arxiv.org/abs/2603.18079v1
Date: Wed, 18 Mar 2026 07:16:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-20 17:19:05.748078
Title: SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training
Title（参考訳）: SLEA-RL:マルチターンエージェントトレーニングのためのステップレベル体験強化強化学習
Authors: Prince Zizhuang Wang, Shuli Jiang,
Abstract要約: 本研究では,SLEA-RL(Step-Level Experience-Augmented Reinforcement Learning)を提案する。 SLEA-RLは、(i)効率的なクラスタインデックス検索のために構造的に等価な環境状態をグループ化するステップレベルの監視クラスタリング、(ii)スコアベースの入出力とレート制限抽出を通じて成功戦略と失敗パターンを蒸留する自己進化体験ライブラリ、(iii)マルチターンエピソード間のきめ細かな優位性推定のためのステップレベルの信用割当によるポリシー最適化の3つのコンポーネントを通して機能する。
参考スコア（独自算出の注目度）: 2.291770711277359
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Model (LLM) agents have shown strong results on multi-turn tool-use tasks, yet they operate in isolation during training, failing to leverage experiences accumulated across episodes. Existing experience-augmented methods address this by organizing trajectories into retrievable libraries, but they retrieve experiences only once based on the initial task description and hold them constant throughout the episode. In multi-turn settings where observations change at every step, this static retrieval becomes increasingly mismatched as episodes progress. We propose SLEA-RL (Step-Level Experience-Augmented Reinforcement Learning), a framework that retrieves relevant experiences at each decision step conditioned on the current observation. SLEA-RL operates through three components: (i) step-level observation clustering that groups structurally equivalent environmental states for efficient cluster-indexed retrieval; (ii) a self-evolving experience library that distills successful strategies and failure patterns through score-based admission and rate-limited extraction; and (iii) policy optimization with step-level credit assignment for fine-grained advantage estimation across multi-turn episodes. The experience library evolves alongside the policy through semantic analysis rather than gradient updates. Experiments on long-horizon multi-turn agent benchmarks demonstrate that SLEA-RL achieves superior performance compared to various reinforcement learning baselines.
Abstract（参考訳）: 大規模言語モデル(LLM)エージェントは、マルチターンツール使用タスクにおいて強力な結果を示しているが、トレーニング中に独立して動作し、エピソード全体で蓄積された経験を活用することができない。既存のエクスペリエンス拡張メソッドは、トラジェクトリを検索可能なライブラリに整理することでこの問題に対処するが、最初のタスク記述に基づいて一度だけエクスペリエンスを検索し、エピソード全体を通して一定に保持する。各ステップで観測が変わるマルチターン設定では、エピソードが進行するにつれて、この静的検索はますます不一致になる。本研究では,SLEA-RL(Step-Level Experience-Augmented Reinforcement Learning)を提案する。 SLEA-RLは3つのコンポーネントを通して動作する。一効率的なクラスタインデックス検索のための構造的に等価な環境状態をグループ化する段階的観測クラスタリング (二スコアベース入場及びレート制限抽出により、成功戦略及び失敗パターンを蒸留する自己進化体験図書館三マルチターンエピソード間のきめ細かい有利度推定のためのステップレベル信用割当による政策最適化エクスペリエンスライブラリは、勾配更新ではなく、セマンティック分析を通じてポリシーとともに進化する。長距離マルチターンエージェントベンチマーク実験により,SLEA-RLは各種強化学習ベースラインと比較して優れた性能を示した。

論文の概要: SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training

関連論文リスト