Fugu-MT 論文翻訳(概要): Reinforcement Fine-Tuning for Reasoning towards Multi-Step Multi-Source Search in Large Language Models

論文の概要: Reinforcement Fine-Tuning for Reasoning towards Multi-Step Multi-Source Search in Large Language Models

arxiv url: http://arxiv.org/abs/2506.08352v1
Date: Tue, 10 Jun 2025 02:09:57 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-11 15:11:41.151774
Title: Reinforcement Fine-Tuning for Reasoning towards Multi-Step Multi-Source Search in Large Language Models
Title（参考訳）: 大規模言語モデルにおけるマルチステップマルチソース探索への推論のための強化細調整
Authors: Wentao Shi, Yiqing Shen,
Abstract要約: Reasoning-Search (R-Search)は、マルチステップ計画、複数ソース検索の実行、回答合成を統一するシングルLLM検索フレームワークである。 R-Searchは、検索プロセスを導く推論ステップを含む、明示的に定義された4つのコンポーネントに出力を構造化する。
参考スコア（独自算出の注目度）: 7.719379471690927
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) can face factual limitations when responding to time-sensitive queries about recent events that arise after their knowledge thresholds in the training corpus. Existing search-augmented approaches fall into two categories, each with distinct limitations: multi-agent search frameworks incur substantial computational overhead by separating search planning and response synthesis across multiple LLMs, while single-LLM tool-calling methods restrict themselves to sequential planned, single-query searches from sole search sources. We present Reasoning-Search (R-Search), a single-LLM search framework that unifies multi-step planning, multi-source search execution, and answer synthesis within one coherent inference process. Innovatively, it structure the output into four explicitly defined components, including reasoning steps that guide the search process (<think>), a natural-language directed acyclic graph that represents the search plans with respect to diverse sources (<search>), retrieved results from executing the search plans (<result>), and synthesized final answers (<answer>). To enable effective generation of these structured outputs, we propose a specialized Reinforcement Fine-Tuning (ReFT) method based on GRPO, together with a multi-component reward function that optimizes LLM's answer correctness, structural validity of the generated DAG, and adherence to the defined output format. Experimental evaluation on FinSearchBench-24, SearchExpertBench-25, and seven Q and A benchmarks demonstrates that R-Search outperforms state-of-the-art methods, while achieving substantial efficiency gains through 70% reduction in context token usage and approximately 50% decrease in execution latency. Code is available at https://github.com/wentao0429/Reasoning-search.
Abstract（参考訳）: 大規模言語モデル(LLM)は、トレーニングコーパスにおける知識しきい値の後に発生する最近の事象に関する時間に敏感なクエリに応答するときに、現実的な制限に直面する可能性がある。既存の検索拡張アプローチは2つのカテゴリに分類される: マルチエージェント検索フレームワークは、複数のLLM間で検索計画と応答合成を分離することで、かなりの計算オーバーヘッドを発生させる。本稿では,マルチステップ計画,複数ソース検索の実行,一貫した推論プロセス内での回答合成を統一する単一LLM検索フレームワークであるReasoning-Search(R-Search)を提案する。 Innovatively, it structure the output into four specific defined components, including reasoning steps that guide the search process ((<think>), a natural-language direct acyclic graph which represent the search plan based to various source (search>), retrieved results from execution the search plan ((<result>), and synthesisd final answer (answer>)。これらの構造的出力を効果的に生成するために,GRPOに基づく特殊強化細調整(Reinforcement Fine-Tuning, ReFT)手法と, LLMの回答正当性, 生成されたDAGの構造的妥当性, 定義された出力形式への適応性を最適化する多成分報酬関数を提案する。 FinSearchBench-24,SearchExpertBench-25,および7つのQおよびAベンチマークに対する実験的評価では、R-Searchは最先端の手法よりも優れており、コンテキストトークンの使用率が70%減少し、実行遅延が約50%減少する。コードはhttps://github.com/wentao0429/Reasoning-searchで入手できる。

関連論文リスト

Benchmarking Deep Search over Heterogeneous Enterprise Data [73.55304268238474]
検索強化生成(RAG)の形式を評価するための新しいベンチマークを提案する。 RAGは、多種多様な、しかし関連するソースに対して、ソースを意識したマルチホップ推論を必要とする。製品計画、開発、サポートステージをまたいだビジネスをシミュレートする合成データパイプラインを使用して構築します。
論文参考訳（メタデータ） (2025-06-29T08:34:59Z)
MMSearch-R1: Incentivizing LMMs to Search [49.889749277236376]
MMSearch-R1は,実世界のインターネット環境において,オンデマンドでマルチターン検索が可能な,初のエンドツーエンド強化学習フレームワークである。本フレームワークは画像検索とテキスト検索の両方を統合し,検索ペナルティによる結果に基づく報酬によって,モデルがいつ,どのように呼び出すかの判断を可能にする。
論文参考訳（メタデータ） (2025-06-25T17:59:42Z)
R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning [0.8388591755871735]
R-SearchはReasoning-Search統合のための強化学習フレームワークである。ディープ・サーチ・インタラクションを伴う多段階推論を自律的に実行するために,大規模言語モデルを誘導する。 R-Searchは,マルチリワード信号による最適推論探索軌跡を学習する。
論文参考訳（メタデータ） (2025-06-04T17:29:22Z)
Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
大規模言語モデル(LLM)は、従来の手法を進化させるために情報検索に広く統合されている。エージェント検索フレームワークであるEXSEARCHを提案する。 4つの知識集約ベンチマークの実験では、EXSEARCHはベースラインを大幅に上回っている。
論文参考訳（メタデータ） (2025-05-26T15:27:55Z)
Enhancing LLMs' Reasoning-Intensive Multimedia Search Capabilities through Fine-Tuning and Reinforcement Learning [6.327006563699527]
本稿では,大規模言語モデル(LLM)駆動検索エージェントのトレーニング手法であるSearchExpertを紹介する。我々は、トークン消費を減らすために、効率的な自然言語表現で探索計画を再構築する。推論集約的な探索能力を向上させるために,探索フィードバックからの強化学習を提案する。
論文参考訳（メタデータ） (2025-05-24T19:00:36Z)
Reinforcing Compositional Retrieval: Retrieving Step-by-Step for Composing Informative Contexts [67.67746334493302]
大規模言語モデル(LLM)は、多くのタスクにまたがる顕著な機能を示してきたが、複雑なタスクを扱うために外部のコンテキストに依存していることが多い。我々は、このプロセスをマルコフ決定プロセス(MDP)としてモデル化するトリエンコーダシーケンシャルレトリバーを提案する。提案手法は,サンプル間の依存関係を明示的にモデル化することの重要性を強調し,ベースラインを一貫して大幅に上回ることを示す。
論文参考訳（メタデータ） (2025-04-15T17:35:56Z)
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning [50.419872452397684]
Search-R1は推論フレームワークのための強化学習の拡張である。リアルタイム検索とステップバイステップ推論の間に検索クエリを生成する。性能は41%(Qwen2.5-7B)、20%(Qwen2.5-3B)で改善されている。
論文参考訳（メタデータ） (2025-03-12T16:26:39Z)
Retrieval with Learned Similarities [2.729516456192901]
最先端の検索アルゴリズムは、学習された類似点に移行した。そこで本研究では,Mixture-of-Logits (MoL) を実証的に実現し,多様な検索シナリオにおいて優れた性能が得られることを示す。
論文参考訳（メタデータ） (2024-07-22T08:19:34Z)
ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
我々は、エンドツーエンドのクロスモーダル検索のための先駆的なジェネリッククロスモーダル rEtrieval framework (ACE) を提案する。 ACEは、クロスモーダル検索における最先端のパフォーマンスを達成し、Recall@1の強いベースラインを平均15.27%上回る。
論文参考訳（メタデータ） (2024-06-25T12:47:04Z)
Large Search Model: Redefining Search Stack in the Era of LLMs [63.503320030117145]
我々は,1つの大言語モデル(LLM)で検索タスクを統一することにより,従来の検索スタックを再定義する,大規模検索モデルと呼ばれる新しい概念的フレームワークを導入する。全てのタスクは自動回帰テキスト生成問題として定式化され、自然言語のプロンプトを使ってタスクをカスタマイズできる。提案フレームワークは,LLMの強力な言語理解と推論能力を活用し,既存の検索スタックを簡素化しつつ,検索結果の質を向上させる能力を提供する。
論文参考訳（メタデータ） (2023-10-23T05:52:09Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。