When Routing Collapses: On the Degenerate Convergence of LLM Routers
- URL: http://arxiv.org/abs/2602.03478v1
- Date: Tue, 03 Feb 2026 12:51:55 GMT
- Title: When Routing Collapses: On the Degenerate Convergence of LLM Routers
- Authors: Guannan Lai, Han-Jia Ye,
- Abstract summary: As user's cost budget increases, routers systematically default to the most capable and most expensive model.<n>We propose Equi, a decision-aware router that directly learns model rankings.<n>On RouterBench, Equi reduces cost by about 17% at GPT-4-level performance compared to the strongest prior router.
- Score: 46.01380774114097
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LLM routing aims to achieve a favorable quality--cost trade-off by dynamically assigning easy queries to smaller models and harder queries to stronger ones. However, across both unimodal and multimodal settings, we uncover a pervasive yet underexplored failure mode in existing routers: as the user's cost budget increases, routers systematically default to the most capable and most expensive model even when cheaper models already suffice. As a result, current routers under-utilize small models, wasting computation and monetary cost and undermining the core promise of routing; we term this phenomenon routing collapse. We attribute routing collapse to an objective--decision mismatch: many routers are trained to predict scalar performance scores, whereas routing decisions ultimately depend on discrete comparisons among candidate models. Consequently, small prediction errors can flip relative orderings and trigger suboptimal selections. To bridge this gap, we propose EquiRouter, a decision-aware router that directly learns model rankings, restoring the role of smaller models and mitigating routing collapse. On RouterBench, EquiRouter reduces cost by about 17\% at GPT-4-level performance compared to the strongest prior router. Our code is available at https://github.com/AIGNLAI/EquiRouter.
Related papers
- Routing, Cascades, and User Choice for LLMs [9.28138618885869]
We study the effect of LLM routing with respect to user behavior.<n>We propose a game between an LLM provider with two models and a user who can re-prompt or abandon tasks.<n>The user's goal is to maximize their utility minus the delay from using the model, while the provider minimizes the cost of servicing the user.
arXiv Detail & Related papers (2026-02-10T15:39:31Z) - Dr.LLM: Dynamic Layer Routing in LLMs [55.11953638340419]
Dr.LLM is a retrofittable framework that equips pretrained models with lightweight per-layer routers deciding to skip, execute, or repeat a block.<n>On ARC (logic) and DART (math), Dr.LLM improves accuracy by up to +3.4%p while saving 5 layers per example on average.
arXiv Detail & Related papers (2025-10-14T17:51:26Z) - ProxRouter: Proximity-Weighted LLM Query Routing for Improved Robustness to Outliers [14.831117443453165]
Large language model (LLM) query routers are critical to modern AI platforms.<n>We propose Prox, which applies an exponentially tilted aggregation mechanism to balance bias and variance in nonparametric routers.
arXiv Detail & Related papers (2025-10-10T20:28:14Z) - xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning [104.63494870852894]
We present x, a tool-calling-based routing system in which a learned router can either answer directly or invoke one or more external models.<n>Our implementation encompasses the full reinforcement learning framework, including reward and cost accounting.<n>Across diverse benchmarks, x achieves strong cost-performance trade-offs.
arXiv Detail & Related papers (2025-10-09T16:52:01Z) - Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs [69.2486294522259]
BaRP is a Bandit Routing-feedback with Preferences approach that trains under the same partial-feedback restriction as deployment.<n> Framed as a contextual bandit over prompt features and a user preference vector, our method simulates an online feedback setting during training and adapts its routing decisions to each new prompt.
arXiv Detail & Related papers (2025-10-08T18:24:59Z) - Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning [27.70756702796812]
We present textbf generalization-R1, a reinforcement learning framework that formulates multi-LLM routing and aggregation as a sequential decision process.<n>To facilitate learning, we employ a lightweight rule-based reward comprising format rewards, final outcome rewards, and a novel cost reward for optimizing the balance between performance and cost.
arXiv Detail & Related papers (2025-06-10T17:56:45Z) - How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities [62.474732677086855]
Large language model (LLM) routing has emerged as a crucial strategy for balancing computational costs with performance.<n>We propose the DSC benchmark: Diverse, Simple, and Categorized, an evaluation framework that categorizes router performance across a broad spectrum of query types.
arXiv Detail & Related papers (2025-03-20T19:52:30Z) - OmniRouter: Budget and Performance Controllable Multi-LLM Routing [31.60019342381251]
Large language models (LLMs) deliver superior performance but require substantial computational resources and operate with relatively low efficiency.<n>We introduce Omni, a controllable routing framework for multi-LLM serving.<n>Experiments show that Omni achieves up to 6.30% improvement in response accuracy while simultaneously reducing computational costs by at least 10.15%.
arXiv Detail & Related papers (2025-02-27T22:35:31Z) - MasRouter: Learning to Route LLMs for Multi-Agent Systems [14.029698552632107]
Multi-agent systems powered by Large Language Models (LLMs) have been demonstrated to push the boundaries of LLM capabilities.<n>Current routing methods effectively reduce overhead in single-agent scenarios by customizing LLM selection for each query.<n>We first introduce the problem of Multi-Agent Routing System (MASR), which integrates all components of MAS into a unified routing framework.<n>Mas is (1) high-performing, achieving a $1.8%sim8.2%$ improvement over the state-of-the-art method on MBPP; (2) economical, reducing overhead by up to $52.07%$ compared to S
arXiv Detail & Related papers (2025-02-16T14:00:59Z) - Universal Model Routing for Efficient LLM Inference [69.86195589350264]
Model routing is a technique for reducing the inference cost of large language models (LLMs)<n>We propose UniRoute, a new approach to the problem of dynamic routing, where new, previously unobserved LLMs are available at test time.<n>We show that these are estimates of a theoretically optimal routing rule, and quantify their errors via an excess risk bound.
arXiv Detail & Related papers (2025-02-12T20:30:28Z) - HyperRouter: Towards Efficient Training and Inference of Sparse Mixture
of Experts [34.08858035082419]
This work introduces HyperRout, which dynamically generates the router's parameters through a fixed hypernetwork and trainable embeddings.
Experiments across a wide range of tasks demonstrate the superior performance and efficiency gains of HyperRout.
arXiv Detail & Related papers (2023-12-12T07:40:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.