Query Routing for Retrieval-Augmented Language Models
- URL: http://arxiv.org/abs/2505.23052v1
- Date: Thu, 29 May 2025 03:44:56 GMT
- Title: Query Routing for Retrieval-Augmented Language Models
- Authors: Jiarui Zhang, Xiangyu Liu, Yong Hu, Chaoyue Niu, Fan Wu, Guihai Chen,
- Abstract summary: Retrieval-Augmented Generation (RAG) significantly improves the performance of Large Language Models (LLMs) on knowledge-intensive tasks.<n>We observe that external documents dynamically affect LLM's ability to answer queries, while existing routing methods exhibit suboptimal performance in RAG scenarios.<n>We propose RAG, a parametric RAG-aware routing design, which leverages document embeddings and RAG capability embeddings with contrastive learning to capture knowledge representation shifts.
- Score: 38.05904245087491
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Retrieval-Augmented Generation (RAG) significantly improves the performance of Large Language Models (LLMs) on knowledge-intensive tasks. However, varying response quality across LLMs under RAG necessitates intelligent routing mechanisms, which select the most suitable model for each query from multiple retrieval-augmented LLMs via a dedicated router model. We observe that external documents dynamically affect LLMs' ability to answer queries, while existing routing methods, which rely on static parametric knowledge representations, exhibit suboptimal performance in RAG scenarios. To address this, we formally define the new retrieval-augmented LLM routing problem, incorporating the influence of retrieved documents into the routing framework. We propose RAGRouter, a RAG-aware routing design, which leverages document embeddings and RAG capability embeddings with contrastive learning to capture knowledge representation shifts and enable informed routing decisions. Extensive experiments on diverse knowledge-intensive tasks and retrieval settings show that RAGRouter outperforms the best individual LLM by 3.61% on average and existing routing methods by 3.29%-9.33%. With an extended score-threshold-based mechanism, it also achieves strong performance-efficiency trade-offs under low-latency constraints.
Related papers
- RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory [57.449129198822476]
RCR is a role-aware context routing framework for multi-agent large language model (LLM) systems.<n>It dynamically selects semantically relevant memory subsets for each agent based on its role and task stage.<n>A lightweight scoring policy guides memory selection, and agent outputs are integrated into a shared memory store.
arXiv Detail & Related papers (2025-08-06T21:59:34Z) - DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router [57.28685457991806]
DeepSieve is an agentic RAG framework that incorporates information sieving via LLM-as-a-knowledge-router.<n>Our design emphasizes modularity, transparency, and adaptability, leveraging recent advances in agentic system design.
arXiv Detail & Related papers (2025-07-29T17:55:23Z) - LTRR: Learning To Rank Retrievers for LLMs [53.285436927963865]
We show that routing-based RAG systems can outperform the best single-retriever-based systems.<n>Performance gains are especially pronounced in models trained with the Answer Correctness (AC) metric.<n>As part of the SIGIR 2025 LiveRAG challenge, our submitted system demonstrated the practical viability of our approach.
arXiv Detail & Related papers (2025-06-16T17:53:18Z) - Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning [12.878608250420832]
We present textbf generalization-R1, a reinforcement learning framework that formulates multi-LLM routing and aggregation as a sequential decision process.<n>To facilitate learning, we employ a lightweight rule-based reward comprising format rewards, final outcome rewards, and a novel cost reward for optimizing the balance between performance and cost.
arXiv Detail & Related papers (2025-06-10T17:56:45Z) - RadialRouter: Structured Representation for Efficient and Robust Large Language Models Routing [31.446419903916425]
Radial is a novel framework for large language models routing.<n>It uses a lightweight Transformer-based backbone with a radial structure named RadialFormer to articulate the query-LLMs relationship.<n>It significantly outperforms existing routing methods by 9.2% and 5.8% in the Balance and Cost First scenarios.
arXiv Detail & Related papers (2025-06-04T12:16:41Z) - IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory [26.39979967537193]
Large language models (LLMs) have demonstrated exceptional performance across a wide range of natural language tasks.<n>While powerful models deliver better results, they come at a high cost, whereas smaller models are more cost-effective but less capable.<n>We propose IRT-Merci, a multi-LLM routing framework that efficiently routes user queries to the most suitable LLM.
arXiv Detail & Related papers (2025-06-01T15:14:58Z) - Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning [60.84901522792042]
Multimodal Retrieval-Augmented Generation (MRAG) has shown promise in mitigating hallucinations in Multimodal Large Language Models (MLLMs)<n>We propose R1, a novel MRAG framework that learns to decide when and where to retrieve knowledge based on the evolving reasoning state.<n>R1- can adaptively and effectively leverage diverse KBs, reducing unnecessary retrievals and improving both efficiency and accuracy.
arXiv Detail & Related papers (2025-05-28T08:17:57Z) - LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs -- No Silver Bullet for LC or RAG Routing [70.35888047551643]
We present LaRA, a novel benchmark specifically designed to rigorously compare RAG and LC LLMs.<n>LaRA encompasses 2326 test cases across four practical QA task categories and three types of naturally occurring long texts.<n>We find that the optimal choice between RAG and LC depends on a complex interplay of factors, including the model's parameter size, long-text capabilities, context length, task type, and the characteristics of the retrieved chunks.
arXiv Detail & Related papers (2025-02-14T08:04:22Z) - Universal Model Routing for Efficient LLM Inference [72.65083061619752]
We consider the problem of dynamic routing, where new, previously unobserved LLMs are available at test time.<n>We propose a new approach to this problem that relies on representing each LLM as a feature vector, derived based on predictions on a set of representative prompts.<n>We prove that these strategies are estimates of a theoretically optimal routing rule, and provide an excess risk bound to quantify their errors.
arXiv Detail & Related papers (2025-02-12T20:30:28Z) - Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation [43.630437906898635]
We propose a novel two-stage fine-tuning architecture called Invar-RAG.
In the retrieval stage, an LLM-based retriever is constructed by integrating LoRA-based representation learning.
In the generation stage, a refined fine-tuning method is employed to improve LLM accuracy in generating answers based on retrieved information.
arXiv Detail & Related papers (2024-11-11T14:25:37Z) - RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards [78.74923079748521]
Retrieval-Augmented Generation (RAG) has proven its effectiveness in mitigating hallucinations in Large Language Models (LLMs) by retrieving knowledge from external resources.<n>Current approaches use instruction tuning to optimize LLMs, improving their ability to utilize retrieved knowledge.<n>We propose a Differentiable Data Rewards ( DDR) method, which trains RAG systems by aligning data preferences between different RAG modules.
arXiv Detail & Related papers (2024-10-17T12:53:29Z) - GraphRouter: A Graph-based Router for LLM Selections [13.463815950807874]
We introduce a novel inductive graph framework, named as Graph, to enhance the selection of Large Language Models (LLMs)<n>Through an edge prediction mechanism, Graph is able to predict attributes (the effect and cost of LLM response) of potential edges, allowing for optimized recommendations.<n>In experiments, Graph substantially surpasses existing routers, delivering a minimum performance improvement of 12.3%.
arXiv Detail & Related papers (2024-10-04T18:02:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.