Fugu-MT 論文翻訳(概要): AI-SearchPlanner: Modular Agentic Search via Pareto-Optimal Multi-Objective Reinforcement Learning

論文の概要: AI-SearchPlanner: Modular Agentic Search via Pareto-Optimal Multi-Objective Reinforcement Learning

arxiv url: http://arxiv.org/abs/2508.20368v3
Date: Tue, 09 Sep 2025 06:38:41 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-10 12:33:22.771749
Title: AI-SearchPlanner: Modular Agentic Search via Pareto-Optimal Multi-Objective Reinforcement Learning
Title（参考訳）: AI-SearchPlanner:Pareto-Optimal Multi-Objective Reinforcement Learningによるモジュールエージェント検索
Authors: Lang Mei, Zhihan Yang, Chong Chen,
Abstract要約: 探索計画に着目し,凍結QAモデルの性能向上を目的とした新しい強化学習フレームワークである textbfAI-SearchPlanner を提案する。実世界のデータセットの実験では、AI SearchPlannerが既存のRLベースの検索エージェントを効率と効率の両方で上回っていることが示されている。
参考スコア（独自算出の注目度）: 7.913125061214038
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent studies have explored integrating Large Language Models (LLMs) with search engines to leverage both the LLMs' internal pre-trained knowledge and external information. Specially, reinforcement learning (RL) has emerged as a promising paradigm for enhancing LLM reasoning through multi-turn interactions with search engines. However, existing RL-based search agents rely on a single LLM to handle both search planning and question-answering (QA) tasks in an end-to-end manner, which limits their ability to optimize both capabilities simultaneously. In practice, sophisticated AI search systems often employ a large, frozen LLM (e.g., GPT-4, DeepSeek-R1) to ensure high-quality QA. Thus, a more effective and efficient approach is to utilize a small, trainable LLM dedicated to search planning. In this paper, we propose \textbf{AI-SearchPlanner}, a novel reinforcement learning framework designed to enhance the performance of frozen QA models by focusing on search planning. Specifically, our approach introduces three key innovations: 1) Decoupling the Architecture of the Search Planner and Generator, 2) Dual-Reward Alignment for Search Planning, and 3) Pareto Optimization of Planning Utility and Cost, to achieve the objectives. Extensive experiments on real-world datasets demonstrate that AI SearchPlanner outperforms existing RL-based search agents in both effectiveness and efficiency, while exhibiting strong generalization capabilities across diverse frozen QA models and data domains.
Abstract（参考訳）: 近年,LLMの内部学習知識と外部情報の両方を活用するために,Large Language Models (LLM) と検索エンジンの統合について検討している。特に、強化学習(RL)は、検索エンジンとのマルチターンインタラクションを通じてLLM推論を強化するための有望なパラダイムとして登場した。しかし、既存のRLベースの検索エージェントは、検索計画と質問応答(QA)タスクの両方をエンドツーエンドで処理するために、1つのLLMに依存しているため、両方の機能を同時に最適化する能力は制限される。実際には、高度なAI検索システムは、高品質なQAを保証するために、大きな凍結LDM(例えば、GPT-4、DeepSeek-R1)を使用することが多い。したがって、より効果的かつ効率的なアプローチは、探索計画専用の小型で訓練可能なLLMを使用することである。本稿では,冷凍QAモデルの性能向上を目的とした新しい強化学習フレームワークである「textbf{AI-SearchPlanner}」を提案する。具体的には、私たちのアプローチには3つの重要なイノベーションがあります。 1)検索プランナーとジェネレータのアーキテクチャの分離 2 探索計画のための二重逆アライメント及び 3 目的を達成するため、計画ユーティリティ及びコストのパレート最適化実世界のデータセットに関する大規模な実験は、AI SearchPlannerが既存のRLベースの検索エージェントを効率と効率の両方で上回り、多様な凍結QAモデルとデータドメインにまたがる強力な一般化能力を示していることを示している。

論文の概要: AI-SearchPlanner: Modular Agentic Search via Pareto-Optimal Multi-Objective Reinforcement Learning

関連論文リスト