Fugu-MT 論文翻訳(概要): Beyond the limitation of a single query: Train your LLM for query expansion with Reinforcement Learning

論文の概要: Beyond the limitation of a single query: Train your LLM for query expansion with Reinforcement Learning

arxiv url: http://arxiv.org/abs/2510.10009v1
Date: Sat, 11 Oct 2025 04:23:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 18:06:29.734658
Title: Beyond the limitation of a single query: Train your LLM for query expansion with Reinforcement Learning
Title（参考訳）: 単一のクエリの制限を超えて: Reinforcement Learningでクエリ拡張のためにLLMをトレーニングする
Authors: Shu Zhao, Tan Yu, Anbang Xu,
Abstract要約: Reasoning-augmented search agent, such as Search-R1, are trained to reason, search, and generate the final answer repeateratively。我々は、強化学習によるクエリ拡張のネイティブ機能を備えたLLMベースの検索エージェントを訓練する。シュレッシャーモデルの助けを借りて,小型の3B LLMでもクエリ拡張の強力な能力を実証できることが判明した。
参考スコア（独自算出の注目度）: 23.104182075898297
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reasoning-augmented search agents, such as Search-R1, are trained to reason, search, and generate the final answer iteratively. Nevertheless, due to their limited capabilities in reasoning and search, their performance on multi-hop QA benchmarks remains far from satisfactory. To handle complex or compound queries, we train an LLM-based search agent with the native capability of query expansion through reinforcement learning. In each turn, our search agent proposes several query variants, which are searched simultaneously to cover more relevant information. Meanwhile, given limited post-training data and computing resources, it is very challenging for a search agent to master multiple tasks, including query generation, retrieved information understanding, and answer generation. Therefore, we propose incorporating a pre-trained squeezer model that helps the search agent understand the retrieved documents, allowing the search agent to focus on query generation for high retrieval recall. With the assistance of the squeezer model, we discover that even a small-scale 3B LLM can demonstrate a strong capability of query expansion and achieve state-of-the-art accuracy on the multi-hop QA benchmarks. To be specific, our experiments across seven question-answering benchmarks demonstrate that our method, named ExpandSearch, achieves an average improvement of 4.4% compared to state-of-the-art baselines, with strong gains on multi-hop reasoning tasks requiring diverse evidence aggregation.
Abstract（参考訳）: Reasoning-augmented search agent, such as Search-R1, are trained to reason, search, and generate the final answer repeateratively。それでも、推論や検索の能力が限られているため、マルチホップのQAベンチマークのパフォーマンスは十分ではない。複雑なクエリや複合クエリを扱うために、強化学習によるクエリ拡張のネイティブ機能を備えたLLMベースの検索エージェントを訓練する。それぞれのターンで検索エージェントが複数のクエリー変種を提案し、同時に検索を行い、より関連性の高い情報をカバーする。一方、学習後データや計算資源が限られているため、クエリ生成、検索情報理解、回答生成など、検索エージェントが複数のタスクをマスターすることは極めて困難である。そこで本研究では,検索エージェントが検索した文書の理解を支援するために,事前学習したシュレッシャーモデルを組み込むことを提案し,検索エージェントが高い検索リコールのためのクエリ生成に集中できるようにする。シュレッシャーモデルの助けを借りて,小型の3B LLMでもクエリ拡張の強力な能力を示し,マルチホップQAベンチマークで最先端の精度を実現することができることがわかった。具体的には,提案手法であるExpandSearchが,最先端のベースラインに比べて平均4.4%向上し,多様なエビデンスアグリゲーションを必要とするマルチホップ推論タスクに強い効果が得られたことを示す。

論文の概要: Beyond the limitation of a single query: Train your LLM for query expansion with Reinforcement Learning

関連論文リスト