Fugu-MT 論文翻訳(概要): Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection

論文の概要: Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection

arxiv url: http://arxiv.org/abs/2605.10235v2
Date: Tue, 12 May 2026 12:50:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 18:21:07.114177
Title: Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection
Title（参考訳）: 検索前の経路:RAGと長期選択のためのLCMの遅延ルーティング能力の活性化
Authors: Yiwen Chen, Kuan Li, Fuzhen Zhuang, Deqing Wang, Zhao Zhang, Liwen Zhang, Yong Jiang, Shuai Wang, Minhao Cheng,
Abstract要約: Pre-Routeは、応答前に構造化推論を実行するプロアクティブなルーティングフレームワークである。本研究は, (i) LLMは, ガイドラインを確実に適用可能な遅延ルーティング能力を有すること, (ii) 線形プローブにより, 表現空間における最適ルーティングの分離性を高めること, (iii) 蒸留により, この推論構造を, 軽量展開のためのより小さなモデルに伝達すること,の3つの重要な知見を示す。
参考スコア（独自算出の注目度）: 57.3886742625188
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in large language models (LLMs) have expanded the context window to beyond 128K tokens, enabling long-document understanding and multi-source reasoning. A key challenge, however, lies in choosing between retrieval-augmented generation (RAG) and long-context (LC) strategies: RAG is efficient but constrained by retrieval quality, while LC supports global reasoning at higher cost and with position sensitivity. Existing methods such as Self-Route adopt failure-driven fallback from RAG to LC, but remain passive, inefficient, and hard to interpret. We propose Pre-Route, a proactive routing framework that performs structured reasoning before answering. Using lightweight metadata (e.g., document type, length, initial snippet), Pre-Route enables task analysis, coverage estimation, and information-need prediction, producing explainable and cost-efficient routing decisions. Our study shows three key findings: (i) LLMs possess latent routing ability that can be reliably elicited with guidelines, allowing single-sample performance to approach that of multi-sample (Best-of-N) results; (ii) linear probes reveal that structured prompts sharpen the separability of the "optimal routing dimension" in representation space; and (iii) distillation transfers this reasoning structure to smaller models for lightweight deployment. Experiments on LaRA (in-domain) and LongBench-v2 (OOD) confirm that Pre-Route outperforms Always-RAG, Always-LC, and Self-Route baselines, achieving superior overall cost-effectiveness.
Abstract（参考訳）: 大規模言語モデル(LLM)の最近の進歩は、コンテキストウィンドウを128Kトークンを超えて拡張し、長期文書理解とマルチソース推論を可能にした。しかし、重要な課題は、検索強化世代(RAG)と長期コンテキスト(LC)戦略のどちらを選択するかである。 Self-Routeのような既存の手法では、RAGからLCへの障害駆動のフォールバックが採用されているが、受動的で非効率で解釈が難しい。応答前に構造化推論を行うプロアクティブルーティングフレームワークであるPre-Routeを提案する。軽量メタデータ(例:ドキュメントタイプ、長さ、初期スニペット)を使用することで、Pre-Routeはタスク分析、カバレッジ推定、情報に依存した予測を可能にし、説明可能な、コスト効率の高いルーティング決定を生成する。私たちの研究は3つの重要な発見を示します。 i) LLMは、ガイドラインを確実に適用可能な遅延ルーティング機能を有しており、シングルサンプルのパフォーマンスがマルチサンプル(Best-of-N)結果に近づくことができる。 (II)線形プローブは、構造的プロンプトが表現空間における「最適経路次元」の分離性を鋭くすることを明らかにする。三蒸留は、この推論構造を軽量展開のためのより小さなモデルに伝達する。 LaRA(ドメイン内)とLongBench-v2(OOD)の実験は、Pre-RouteがAlways-RAG、Always-LC、Self-Routeベースラインより優れ、全体的なコスト効率が優れていることを確認した。

論文の概要: Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection

関連論文リスト