Related papers: ProxRouter: Proximity-Weighted LLM Query Routing for Improved Robustness to Outliers

ProxRouter: Proximity-Weighted LLM Query Routing for Improved Robustness to Outliers

URL: http://arxiv.org/abs/2510.09852v1
Date: Fri, 10 Oct 2025 20:28:14 GMT
Title: ProxRouter: Proximity-Weighted LLM Query Routing for Improved Robustness to Outliers
Authors: Shivam Patel, Neharika Jali, Ankur Mallick, Gauri Joshi,
Abstract summary: Large language model (LLM) query routers are critical to modern AI platforms.<n>We propose Prox, which applies an exponentially tilted aggregation mechanism to balance bias and variance in nonparametric routers.
Score: 14.831117443453165
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large language model (LLM) query routers are critical to modern AI platforms as they seek to improve efficiency by assigning inference queries to accurate, yet low-cost models. Parametric routers typically use trained neural networks for LLM selection but suffer from retraining and maintenance overheads. Nonparametric routers are training-free, instead estimating LLM accuracy and cost via similarity between encodings of the input query and training set queries. However, like their parametric counterparts, nonparametric routers struggle to generalize to outlier queries, an issue exacerbated by limited diversity in training sets which are costly to expand and difficult to keep current with ever-evolving use cases. We propose ProxRouter, which applies an exponentially tilted aggregation mechanism to balance bias and variance in nonparametric routers, improving their robustness to outliers. Experiments show ProxRouter enhances outlier routing while preserving inlier performance with minimal overhead.

Related papers

When Routing Collapses: On the Degenerate Convergence of LLM Routers [46.01380774114097]
As user's cost budget increases, routers systematically default to the most capable and most expensive model.<n>We propose Equi, a decision-aware router that directly learns model rankings.<n>On RouterBench, Equi reduces cost by about 17% at GPT-4-level performance compared to the strongest prior router.
arXiv Detail & Related papers (2026-02-03T12:51:55Z)
Dr.LLM: Dynamic Layer Routing in LLMs [55.11953638340419]
Dr.LLM is a retrofittable framework that equips pretrained models with lightweight per-layer routers deciding to skip, execute, or repeat a block.<n>On ARC (logic) and DART (math), Dr.LLM improves accuracy by up to +3.4%p while saving 5 layers per example on average.
arXiv Detail & Related papers (2025-10-14T17:51:26Z)
xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning [104.63494870852894]
We present x, a tool-calling-based routing system in which a learned router can either answer directly or invoke one or more external models.<n>Our implementation encompasses the full reinforcement learning framework, including reward and cost accounting.<n>Across diverse benchmarks, x achieves strong cost-performance trade-offs.
arXiv Detail & Related papers (2025-10-09T16:52:01Z)
Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs [69.2486294522259]
BaRP is a Bandit Routing-feedback with Preferences approach that trains under the same partial-feedback restriction as deployment.<n> Framed as a contextual bandit over prompt features and a user preference vector, our method simulates an online feedback setting during training and adapts its routing decisions to each new prompt.
arXiv Detail & Related papers (2025-10-08T18:24:59Z)
From Score Distributions to Balance: Plug-and-Play Mixture-of-Experts Routing [52.01745035243826]
Mixture-of-Experts (MoE) models can scale parameter capacity by routing each token to a subset of experts.<n> conditional routing shifts the burden on inference memory, limiting the number of experts per device.<n>We present LASER, a plug-and-play, inference-time routing algorithm that balances load while preserving accuracy.
arXiv Detail & Related papers (2025-09-29T16:29:17Z)
Cost-Aware Contrastive Routing for LLMs [56.94921736486255]
We introduce Cost-Spectrum Contrastive Routing (CSCR), a lightweight framework that maps both prompts and models into a shared embedding space.<n>CSCR consistently outperforms baselines, improving the accuracy-cost tradeoff by up to 25%.
arXiv Detail & Related papers (2025-08-17T20:16:44Z)
RadialRouter: Structured Representation for Efficient and Robust Large Language Models Routing [27.481573948464987]
Radial is a novel framework for large language models routing.<n>It uses a lightweight Transformer-based backbone with a radial structure named RadialFormer to articulate the query-LLMs relationship.<n>It significantly outperforms existing routing methods by 9.2% and 5.8% in the Balance and Cost First scenarios.
arXiv Detail & Related papers (2025-06-04T12:16:41Z)
IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory [26.39979967537193]
Large language models (LLMs) have demonstrated exceptional performance across a wide range of natural language tasks.<n>While powerful models deliver better results, they come at a high cost, whereas smaller models are more cost-effective but less capable.<n>We propose IRT-Merci, a multi-LLM routing framework that efficiently routes user queries to the most suitable LLM.
arXiv Detail & Related papers (2025-06-01T15:14:58Z)
SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness of Retrieved Context [39.19789380714972]
Large language models excel at many tasks but often incur high inference costs during deployment.<n>We propose an extremely simple yet effective routing framework for KG-RAG that efficiently balances performance and cost in a plug-and-play manner.
arXiv Detail & Related papers (2025-05-28T14:45:56Z)
How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities [62.474732677086855]
Large language model (LLM) routing has emerged as a crucial strategy for balancing computational costs with performance.<n>We propose the DSC benchmark: Diverse, Simple, and Categorized, an evaluation framework that categorizes router performance across a broad spectrum of query types.
arXiv Detail & Related papers (2025-03-20T19:52:30Z)
Universal Model Routing for Efficient LLM Inference [69.86195589350264]
Model routing is a technique for reducing the inference cost of large language models (LLMs)<n>We propose UniRoute, a new approach to the problem of dynamic routing, where new, previously unobserved LLMs are available at test time.<n>We show that these are estimates of a theoretically optimal routing rule, and quantify their errors via an excess risk bound.
arXiv Detail & Related papers (2025-02-12T20:30:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.