Fugu-MT 論文翻訳(概要): Meta-Reinforcement Learning with Self-Reflection for Agentic Search

論文の概要: Meta-Reinforcement Learning with Self-Reflection for Agentic Search

arxiv url: http://arxiv.org/abs/2603.11327v1
Date: Wed, 11 Mar 2026 21:40:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:25.661209
Title: Meta-Reinforcement Learning with Self-Reflection for Agentic Search
Title（参考訳）: エージェント探索のための自己回帰を用いたメタ強化学習
Authors: Teng Xiao, Yige Yuan, Hamish Ivison, Huaisheng Zhu, Faeze Brahman, Nathan Lambert, Pradeep Dasigi, Noah A. Smith, Hannaneh Hajishirzi,
Abstract要約: 本稿では,自己回帰を用いたエージェント検索のためのテキスト内メタ強化学習(RL)法であるMR-Searchを紹介する。 MR-Searchは、単一の独立したエピソード内のポリシーを微妙な報酬で最適化する代わりに、過去のエピソードを条件付けし、エピソードをまたいだ検索戦略を適応させるポリシーを訓練する。
参考スコア（独自算出の注目度）: 101.39929522022514
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces MR-Search, an in-context meta reinforcement learning (RL) formulation for agentic search with self-reflection. Instead of optimizing a policy within a single independent episode with sparse rewards, MR-Search trains a policy that conditions on past episodes and adapts its search strategy across episodes. MR-Search learns to learn a search strategy with self-reflection, allowing search agents to improve in-context exploration at test-time. Specifically, MR-Search performs cross-episode exploration by generating explicit self-reflections after each episode and leveraging them as additional context to guide subsequent attempts, thereby promoting more effective exploration during test-time. We further introduce a multi-turn RL algorithm that estimates a dense relative advantage at the turn level, enabling fine-grained credit assignment on each episode. Empirical results across various benchmarks demonstrate the advantages of MR-Search over baselines based RL, showing strong generalization and relative improvements of 9.2% to 19.3% across eight benchmarks. Our code and data are available at https://github.com/tengxiao1/MR-Search.
Abstract（参考訳）: 本稿では,自己回帰を用いたエージェント検索のためのテキスト内メタ強化学習(RL)法であるMR-Searchを紹介する。 MR-Searchは、単一の独立したエピソード内のポリシーを粗末な報酬で最適化する代わりに、過去のエピソードを条件付けし、エピソードをまたいだ検索戦略を適応させるポリシーを訓練する。 MR-Searchは、セルフリフレクションで検索戦略を学ぶことで、検索エージェントがテスト時にコンテキスト内探索を改善することができる。具体的には、MR-Searchは、各エピソードの後に明示的な自己反射を発生させ、それを追加の文脈として活用することにより、その後の試みをガイドし、テスト時間中により効果的な探索を促進する。さらに、ターンレベルでの相対的優位性を推定し、各エピソードの詳細なクレジット割り当てを可能にするマルチターンRLアルゴリズムを導入する。様々なベンチマークでの実証的な結果は、ベースラインベースのRLよりもMR-Searchの利点を示し、8つのベンチマークで9.2%から19.3%の強い一般化と相対的な改善を示した。私たちのコードとデータはhttps://github.com/tengxiao1/MR-Searchで公開されています。

論文の概要: Meta-Reinforcement Learning with Self-Reflection for Agentic Search

関連論文リスト