Fugu-MT 論文翻訳(概要): ZeroSearch: Incentivize the Search Capability of LLMs without Searching

論文の概要: ZeroSearch: Incentivize the Search Capability of LLMs without Searching

arxiv url: http://arxiv.org/abs/2505.04588v2
Date: Fri, 16 May 2025 13:53:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-19 14:36:13.160424
Title: ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Title（参考訳）: ZeroSearch: LLMの検索機能を検索なしでインセンティブ化する
Authors: Hao Sun, Zile Qiao, Jiayan Guo, Xuanbo Fan, Yingyan Hou, Yong Jiang, Pengjun Xie, Yan Zhang, Fei Huang, Jingren Zhou,
Abstract要約: 我々はZeroSearchを紹介した。ZeroSearchは、学習中にシミュレーションされた検索を備えた実検索エンジンを使用するための、大規模言語モデルの能力を動機付けるフレームワークである。提案手法は,LLMを有用な文書とノイズの両方を生成可能な検索モジュールに変換するための,軽量な教師付き微調整から始まる。
参考スコア（独自算出の注目度）: 69.55482019211597
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Effective information searching is essential for enhancing the reasoning and generation capabilities of large language models (LLMs). Recent research has explored using reinforcement learning (RL) to improve LLMs' search capabilities by interacting with live search engines in real-world environments. While these approaches show promising results, they face two major challenges: (1) Uncontrolled Document Quality: The quality of documents returned by search engines is often unpredictable, introducing noise and instability into the training process. (2) Prohibitively High API Costs: RL training requires frequent rollouts, potentially involving hundreds of thousands of search requests, which incur substantial API expenses and severely constrain scalability. To address these challenges, we introduce ZeroSearch, a novel RL framework that incentivizes the capabilities of LLMs to use a real search engine with simulated searches during training. Our approach begins with lightweight supervised fine-tuning to transform the LLM into a retrieval module capable of generating both useful and noisy documents in response to a query. During RL training, we employ a curriculum-based rollout strategy that incrementally degrades the quality of generated documents, progressively eliciting the model's reasoning ability by exposing it to increasingly challenging retrieval scenarios. Extensive experiments demonstrate that ZeroSearch effectively incentivizes the search capabilities of LLMs using a 3B LLM as the retrieval module. Remarkably, a 7B retrieval module achieves comparable performance to the real search engine, while a 14B retrieval module even surpasses it. Furthermore, it generalizes well across both base and instruction-tuned models of various parameter sizes and is compatible with a wide range of RL algorithms.
Abstract（参考訳）: 大規模言語モデル(LLM)の推論と生成能力を高めるためには,効果的な情報探索が不可欠である。近年,実環境下でのライブ検索エンジンとの対話により,LLMの検索能力を向上させるための強化学習(RL)について検討している。これらの手法は有望な結果を示す一方で,(1) 制御不能な文書品質: 検索エンジンが返却する文書の品質は予測不能であり,学習プロセスにノイズや不安定さをもたらすことが多い。 2) 禁止性の高いAPIコスト: RLトレーニングには頻繁なロールアウトが必要で、数十万の検索要求が伴う可能性がある。これらの課題に対処するために、ZeroSearchを紹介した。ZeroSearchは、LLMがトレーニング中にシミュレートされた検索で実際の検索エンジンを使用する能力にインセンティブを与える新しいRLフレームワークである。我々のアプローチは、軽量な教師付き微調整から始まり、LLMを検索モジュールに変換し、クエリに応答して有用な文書とノイズの両方を生成する。 RLトレーニング中、我々はカリキュラムベースのロールアウト戦略を採用し、生成した文書の品質を漸進的に劣化させ、ますます困難な検索シナリオに公開することによって、モデルの推論能力を徐々に引き出す。大規模な実験により、ZeroSearchは3B LLMを検索モジュールとして使用することで、LLMの検索能力を効果的にインセンティブすることを示した。注目すべきは、7B検索モジュールが実際の検索エンジンに匹敵する性能を達成し、14B検索モジュールがそれを超えることだ。さらに、様々なパラメータサイズのベースモデルと命令調整モデルの両方でよく一般化され、幅広いRLアルゴリズムと互換性がある。

論文の概要: ZeroSearch: Incentivize the Search Capability of LLMs without Searching

関連論文リスト