Fugu-MT 論文翻訳(概要): When should I search more: Adaptive Complex Query Optimization with Reinforcement Learning

論文の概要: When should I search more: Adaptive Complex Query Optimization with Reinforcement Learning

arxiv url: http://arxiv.org/abs/2601.21208v1
Date: Thu, 29 Jan 2026 03:16:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-30 16:22:49.539891
Title: When should I search more: Adaptive Complex Query Optimization with Reinforcement Learning
Title（参考訳）: いつ検索すべきか:強化学習による適応型複雑クエリ最適化
Authors: Wei Wen, Sihang Deng, Tianjun Wei, Keyu Chen, Ruizhi Qiao, Xing Sun,
Abstract要約: 適応複雑クエリ最適化(ACQO)と呼ばれる新しいRLフレームワークを提案する。我々のフレームワークは、いつ、どのように検索プロセスを拡張するかを適応的に決定するように設計されている。 ACQOは3つの複雑なクエリベンチマークで最先端のパフォーマンスを達成し、確立されたベースラインを大幅に上回っている。
参考スコア（独自算出の注目度）: 26.489185170468062
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Query optimization is a crucial component for the efficacy of Retrieval-Augmented Generation (RAG) systems. While reinforcement learning (RL)-based agentic and reasoning methods have recently emerged as a promising direction on query optimization, most existing approaches focus on the expansion and abstraction of a single query. However, complex user queries are prevalent in real-world scenarios, often requiring multiple parallel and sequential search strategies to handle disambiguation and decomposition. Directly applying RL to these complex cases introduces significant hurdles. Determining the optimal number of sub-queries and effectively re-ranking and merging retrieved documents vastly expands the search space and complicates reward design, frequently leading to training instability. To address these challenges, we propose a novel RL framework called Adaptive Complex Query Optimization (ACQO). Our framework is designed to adaptively determine when and how to expand the search process. It features two core components: an Adaptive Query Reformulation (AQR) module that dynamically decides when to decompose a query into multiple sub-queries, and a Rank-Score Fusion (RSF) module that ensures robust result aggregation and provides stable reward signals for the learning agent. To mitigate training instabilities, we adopt a Curriculum Reinforcement Learning (CRL) approach, which stabilizes the training process by progressively introducing more challenging queries through a two-stage strategy. Our comprehensive experiments demonstrate that ACQO achieves state-of-the-art performance on three complex query benchmarks, significantly outperforming established baselines. The framework also showcases improved computational efficiency and broad compatibility with different retrieval architectures, establishing it as a powerful and generalizable solution for next-generation RAG systems.
Abstract（参考訳）: クエリ最適化は、検索型Augmented Generation(RAG)システムの有効性にとって重要な要素である。強化学習(RL)に基づくエージェントおよび推論手法が,クエリ最適化の有望な方向として最近登場したが,既存のアプローチのほとんどは,単一クエリの拡張と抽象化に重点を置いている。しかし、複雑なユーザクエリは現実のシナリオでは一般的であり、曖昧さと分解を扱うために複数の並列およびシーケンシャルな検索戦略を必要とすることが多い。これらの複雑なケースにRLを直接適用することは、大きなハードルをもたらす。最適なサブクエリ数を決定し、検索されたドキュメントを効果的に再ランク付けし、マージすることで、検索スペースを大きく拡張し、報酬設計を複雑にし、しばしばトレーニング不安定につながる。これらの課題に対処するため,Adaptive Complex Query Optimization (ACQO) と呼ばれる新しいRLフレームワークを提案する。我々のフレームワークは、いつ、どのように検索プロセスを拡張するかを適応的に決定するように設計されている。クエリを複数のサブクエリに分割するタイミングを動的に決定するAdaptive Query Reformulation (AQR) モジュールと、堅牢な結果アグリゲーションを保証し、学習エージェントに安定した報酬信号を提供する Rank-Score Fusion (RSF) モジュールである。トレーニングの不安定性を緩和するために,2段階戦略を通じてより困難なクエリを段階的に導入することにより,トレーニングプロセスを安定化するCRL(Curriculum Reinforcement Learning)アプローチを採用する。包括的実験により、ACQOは3つの複雑なクエリベンチマークで最先端のパフォーマンスを達成し、確立されたベースラインを著しく上回ります。このフレームワークはまた、計算効率の向上と異なる検索アーキテクチャとの広範な互換性を示し、次世代RAGシステムのための強力で一般化可能なソリューションとして確立した。

論文の概要: When should I search more: Adaptive Complex Query Optimization with Reinforcement Learning

関連論文リスト