Fugu-MT 論文翻訳(概要): Rethinking On-policy Optimization for Query Augmentation

論文の概要: Rethinking On-policy Optimization for Query Augmentation

arxiv url: http://arxiv.org/abs/2510.17139v1
Date: Mon, 20 Oct 2025 04:16:28 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 00:56:39.301052
Title: Rethinking On-policy Optimization for Query Augmentation
Title（参考訳）: クエリ拡張のためのオンライン最適化の再考
Authors: Zhichao Xu, Shengyao Zhuang, Xueguang Ma, Bingsen Chen, Yijun Tian, Fengran Mo, Jie Cao, Vivek Srikumar,
Abstract要約: 本稿では,様々なベンチマークにおいて,プロンプトベースとRLベースのクエリ拡張の最初の体系的比較を示す。そこで我々は,検索性能を最大化する擬似文書の生成を学習する,新しいハイブリッド手法 On-policy Pseudo-document Query Expansion (OPQE) を提案する。
参考スコア（独自算出の注目度）: 49.87723664806526
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in large language models (LLMs) have led to a surge of interest in query augmentation for information retrieval (IR). Two main approaches have emerged. The first prompts LLMs to generate answers or pseudo-documents that serve as new queries, relying purely on the model's parametric knowledge or contextual information. The second applies reinforcement learning (RL) to fine-tune LLMs for query rewriting, directly optimizing retrieval metrics. While having respective advantages and limitations, the two approaches have not been compared under consistent experimental conditions. In this work, we present the first systematic comparison of prompting-based and RL-based query augmentation across diverse benchmarks, including evidence-seeking, ad hoc, and tool retrieval. Our key finding is that simple, training-free query augmentation often performs on par with, or even surpasses, more expensive RL-based counterparts, especially when using powerful LLMs. Motivated by this discovery, we introduce a novel hybrid method, On-policy Pseudo-document Query Expansion (OPQE), which, instead of rewriting a query, the LLM policy learns to generate a pseudo-document that maximizes retrieval performance, thus merging the flexibility and generative structure of prompting with the targeted optimization of RL. We show OPQE outperforms both standalone prompting and RL-based rewriting, demonstrating that a synergistic approach yields the best results. Our implementation is made available to facilitate reproducibility.
Abstract（参考訳）: 大規模言語モデル(LLM)の最近の進歩は、情報検索(IR)のためのクエリ拡張への関心の高まりにつながっている。 2つの主要なアプローチが出現した。 1つ目はLCMに対して、モデルのパラメトリック知識やコンテキスト情報に純粋に依存して、新しいクエリとして機能する回答や擬似ドキュメントを生成するように促す。 2つ目は、クエリ書き換えのための微調整LDMに強化学習(RL)を適用し、検索メトリクスを直接最適化する。それぞれの利点と限界はあるものの、2つのアプローチは一貫した実験条件下で比較されていない。本研究では,エビデンス検索,アドホック,ツール検索など,さまざまなベンチマークを対象としたプロンプトベースおよびRLベースのクエリ拡張に関する最初の体系的比較を示す。私たちの重要な発見は、単純でトレーニング不要なクエリ拡張が、特に強力なLLMを使用する場合、より高価なRLベースのクエリに匹敵する、あるいは超えていることです。提案手法は,クエリを書き換える代わりに,検索性能を最大化する擬似文書を生成することを学習し,RLの目的とする最適化を促進させる柔軟性と生成構造を融合させる。 OPQEはスタンドアロンのプロンプトとRLベースの書き換えの両方で優れており、相乗的アプローチが最良の結果をもたらすことを示す。私たちの実装は再現性を促進するために利用可能です。

論文の概要: Rethinking On-policy Optimization for Query Augmentation

関連論文リスト