Fugu-MT 論文翻訳(概要): An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents

論文の概要: An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents

arxiv url: http://arxiv.org/abs/2505.15117v1
Date: Wed, 21 May 2025 05:09:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-22 15:42:58.900544
Title: An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents
Title（参考訳）: 推論型インターリーブLLMエージェントの強化学習に関する実証的研究
Authors: Bowen Jin, Jinsung Yoon, Priyanka Kargupta, Sercan O. Arik, Jiawei Han,
Abstract要約: 強化学習(RL)は、現実世界の問題解決に複雑な推論が可能な大規模言語モデル(LLM)の訓練に強い可能性を示している。最近では、RLを利用して、推論と検索エンジンの使用を巧みに組み合わせた高度なLLMベースの検索エージェントが作成されている。 1)報酬の定式化,(2)基礎となるLLMの選択と特性,(3)RLプロセスにおける検索エンジンの役割など,重要な要素はさらなる調査が必要である。
参考スコア（独自算出の注目度）: 34.25887147052966
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) has demonstrated strong potential in training large language models (LLMs) capable of complex reasoning for real-world problem solving. More recently, RL has been leveraged to create sophisticated LLM-based search agents that adeptly combine reasoning with search engine use. While the use of RL for training search agents is promising, the optimal design of such agents remains not fully understood. In particular, key factors -- such as (1) reward formulation, (2) the choice and characteristics of the underlying LLM, and (3) the role of the search engine in the RL process -- require further investigation. In this work, we conduct comprehensive empirical studies to systematically investigate these and offer actionable insights. We highlight several key findings: format rewards are effective in improving final performance, whereas intermediate retrieval rewards have limited impact; the scale and initialization of the LLM (general-purpose vs. reasoning-specialized) significantly influence RL outcomes; and the choice of search engine plays a critical role in shaping RL training dynamics and the robustness of the trained agent during inference. These establish important guidelines for successfully building and deploying LLM-based search agents in real-world applications. Code is available at https://github.com/PeterGriffinJin/Search-R1.
Abstract（参考訳）: 強化学習(RL)は、現実世界の問題解決のために複雑な推論が可能な大規模言語モデル(LLM)の訓練に強い可能性を示している。最近では、RLを利用して、推論と検索エンジンの使用を巧みに組み合わせた高度なLLMベースの検索エージェントが作成されている。探索エージェントの訓練にRLを用いることは有望であるが、そのようなエージェントの最適設計は未だ十分に理解されていない。特に、(1)報酬の定式化、(2)基礎となるLLMの選択と特性、(3)RLプロセスにおける検索エンジンの役割など、重要な要素は、さらなる調査が必要である。本研究では、これらを体系的に研究し、実用的な洞察を提供するための総合的な実証的研究を行う。形式報酬は最終性能の向上に有効であるのに対して,中間的検索報酬は限定的であり,LLMのスケールと初期化はRLの結果に大きく影響する。これらは、LLMベースの検索エージェントを現実世界のアプリケーションで構築し、デプロイするための重要なガイドラインを確立する。コードはhttps://github.com/PeterGriffinJin/Search-R1.comで入手できる。

論文の概要: An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents

関連論文リスト