Fugu-MT 論文翻訳(概要): AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play

論文の概要: AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play

arxiv url: http://arxiv.org/abs/2509.24193v1
Date: Mon, 29 Sep 2025 02:14:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.691158
Title: AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
Title（参考訳）: AceSearcher: 強化セルフプレイによるLDMのブートストラップ推論と検索
Authors: Ran Xu, Yuchen Zhuang, Zihan Dong, Jonathan Wang, Yue Yu, Joyce C. Ho, Linjun Zhang, Haoyu Wang, Wenqi Shi, Carl Yang,
Abstract要約: AceSearcherは1つの大きな言語モデル(LLM)をトレーニングし、複雑なクエリを分解するデコンポスタと、検索したコンテキストを統合して回答生成するソルバという2つの役割を交互に行う。 10データセットにわたる3つの推論集約タスクの実験は、AceSearcherが最先端のベースラインを上回り、平均的な正確なマッチング改善を7.6%達成していることを示している。
参考スコア（独自算出の注目度）: 45.02121903138421
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Search-augmented LLMs often struggle with complex reasoning tasks due to ineffective multi-hop retrieval and limited reasoning ability. We propose AceSearcher, a cooperative self-play framework that trains a single large language model (LLM) to alternate between two roles: a decomposer that breaks down complex queries and a solver that integrates retrieved contexts for answer generation. AceSearcher couples supervised fine-tuning on a diverse mixture of search, reasoning, and decomposition tasks with reinforcement fine-tuning optimized for final answer accuracy, eliminating the need for intermediate annotations. Extensive experiments on three reasoning-intensive tasks across 10 datasets show that AceSearcher outperforms state-of-the-art baselines, achieving an average exact match improvement of 7.6%. Remarkably, on document-level finance reasoning tasks, AceSearcher-32B matches the performance of the DeepSeek-V3 model using less than 5% of its parameters. Even at smaller scales (1.5B and 8B), AceSearcher often surpasses existing search-augmented LLMs with up to 9x more parameters, highlighting its exceptional efficiency and effectiveness in tackling complex reasoning tasks. Our code will be published at https://github.com/ritaranx/AceSearcher and https://huggingface.co/AceSearcher.
Abstract（参考訳）: 探索強化LLMは、非効率なマルチホップ検索と限定推論能力のため、複雑な推論タスクに苦しむことが多い。 AceSearcherは,1つの大きな言語モデル(LLM)をトレーニングして,複雑なクエリを分解するデコンポスタと,検索したコンテキストを統合して回答を生成するソルバという2つの役割を交互に行う,協調的なセルフプレイフレームワークである。 AceSearcherのカップルは、さまざまな検索、推論、分解タスクの混合に関する微調整を監督し、最終回答の精度に最適化された強化微調整を施し、中間アノテーションの必要性を排除した。 10データセットにわたる3つの推論集約タスクに関する大規模な実験は、AceSearcherが最先端のベースラインを上回り、平均的な正確なマッチング改善を7.6%達成していることを示している。注目すべきなのは、ドキュメントレベルの財務推論タスクにおいて、AceSearcher-32Bはパラメータの5%未満を使用してDeepSeek-V3モデルのパフォーマンスと一致していることだ。たとえ小さなスケール(1.5Bと8B)であっても、AceSearcherは最大9倍のパラメータを持つ既存の検索拡張 LLM を上回り、複雑な推論タスクに対処する際の例外的な効率と有効性を強調している。私たちのコードはhttps://github.com/ritaranx/AceSearcherとhttps://huggingface.co/AceSearcherで公開されます。

論文の概要: AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play

関連論文リスト