Related papers: LLMRouterBench: A Massive Benchmark and Unified Framework for LLM Routing

LLMRouterBench: A Massive Benchmark and Unified Framework for LLM Routing

URL: http://arxiv.org/abs/2601.07206v1
Date: Mon, 12 Jan 2026 05:01:15 GMT
Title: LLMRouterBench: A Massive Benchmark and Unified Framework for LLM Routing
Authors: Hao Li, Yiqun Zhang, Zhaoyan Guo, Chenxu Wang, Shengji Tang, Qiaosheng Zhang, Yang Chen, Biqing Qi, Peng Ye, Lei Bai, Zhen Wang, Shuyue Hu,
Abstract summary: Large language model (LLM) routing assigns each query to the most suitable model from an ensemble.<n>We introduce LLMBench, a large-scale benchmark and unified framework for LLM routing.<n>It comprises over 400K instances from 21 datasets and 33 models.
Score: 44.046399484829635
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language model (LLM) routing assigns each query to the most suitable model from an ensemble. We introduce LLMRouterBench, a large-scale benchmark and unified framework for LLM routing. It comprises over 400K instances from 21 datasets and 33 models. Moreover, it provides comprehensive metrics for both performance-oriented routing and performance-cost trade-off routing, and integrates 10 representative routing baselines. Using LLMRouterBench, we systematically re-evaluate the field. While confirming strong model complementarity-the central premise of LLM routing-we find that many routing methods exhibit similar performance under unified evaluation, and several recent approaches, including commercial routers, fail to reliably outperform a simple baseline. Meanwhile, a substantial gap remains to the Oracle, driven primarily by persistent model-recall failures. We further show that backbone embedding models have limited impact, that larger ensembles exhibit diminishing returns compared to careful model curation, and that the benchmark also enables latency-aware analysis. All code and data are available at https://github.com/ynulihao/LLMRouterBench.

Related papers

Dr.LLM: Dynamic Layer Routing in LLMs [55.11953638340419]
Dr.LLM is a retrofittable framework that equips pretrained models with lightweight per-layer routers deciding to skip, execute, or repeat a block.<n>On ARC (logic) and DART (math), Dr.LLM improves accuracy by up to +3.4%p while saving 5 layers per example on average.
arXiv Detail & Related papers (2025-10-14T17:51:26Z)
RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory [57.449129198822476]
RCR is a role-aware context routing framework for multi-agent large language model (LLM) systems.<n>It dynamically selects semantically relevant memory subsets for each agent based on its role and task stage.<n>A lightweight scoring policy guides memory selection, and agent outputs are integrated into a shared memory store.
arXiv Detail & Related papers (2025-08-06T21:59:34Z)
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning [27.70756702796812]
We present textbf generalization-R1, a reinforcement learning framework that formulates multi-LLM routing and aggregation as a sequential decision process.<n>To facilitate learning, we employ a lightweight rule-based reward comprising format rewards, final outcome rewards, and a novel cost reward for optimizing the balance between performance and cost.
arXiv Detail & Related papers (2025-06-10T17:56:45Z)
RAGRouter: Learning to Route Queries to Multiple Retrieval-Augmented Language Models [45.58601993849455]
Retrieval-Augmented Generation (RAG) significantly improves the performance of Large Language Models (LLMs) on knowledge-intensive tasks.<n>We observe that external documents dynamically affect LLMs' ability to answer queries, while existing routing methods exhibit suboptimal performance in RAG scenarios.<n>We propose RAG, a parametric RAG-aware routing design, which leverages document embeddings and RAG capability embeddings with contrastive learning to capture knowledge representation shifts.
arXiv Detail & Related papers (2025-05-29T03:44:56Z)
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs [45.93874913792025]
We show a novel model-level scaling up phenomenon in routing large language models (LLMs)<n>This improvement can even surpass the performance of the best single model in the pool and many existing strong LLMs.<n>We introduce RouterEval, a benchmark tailored for router research, which includes over 200,000,000 performance records for 12 popular LLM evaluations.
arXiv Detail & Related papers (2025-03-08T04:07:07Z)
Universal Model Routing for Efficient LLM Inference [69.86195589350264]
Model routing is a technique for reducing the inference cost of large language models (LLMs)<n>We propose UniRoute, a new approach to the problem of dynamic routing, where new, previously unobserved LLMs are available at test time.<n>We show that these are estimates of a theoretically optimal routing rule, and quantify their errors via an excess risk bound.
arXiv Detail & Related papers (2025-02-12T20:30:28Z)
RouterBench: A Benchmark for Multi-LLM Routing System [25.515453832224804]
No single model can optimally address all tasks and applications, particularly when balancing performance with cost. This limitation has led to the development of LLM routing systems, which combine the strengths of various models to overcome the constraints of individual LLMs. We present RouterBench, a novel evaluation framework designed to systematically assess the efficacy of LLM routing systems.
arXiv Detail & Related papers (2024-03-18T17:59:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.