Related papers: Adaptive LLM Routing under Budget Constraints

Adaptive LLM Routing under Budget Constraints

URL: http://arxiv.org/abs/2508.21141v2
Date: Tue, 09 Sep 2025 09:54:15 GMT
Title: Adaptive LLM Routing under Budget Constraints
Authors: Pranoy Panda, Raghav Magazine, Chaitanya Devaguptapu, Sho Takemori, Vishal Sharma,
Abstract summary: Large Language Models (LLMs) have revolutionized natural language processing, but their varying capabilities and costs pose challenges in practical applications.<n>Previous approaches treat this as a supervised learning problem, assuming complete knowledge of optimal query-LLM pairings.<n>We propose to study LLM routing as a contextual bandit problem, enabling adaptive decision-making using bandit feedback.
Score: 12.432635540782874
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large Language Models (LLMs) have revolutionized natural language processing, but their varying capabilities and costs pose challenges in practical applications. LLM routing addresses this by dynamically selecting the most suitable LLM for each query/task. Previous approaches treat this as a supervised learning problem, assuming complete knowledge of optimal query-LLM pairings. However, real-world scenarios lack such comprehensive mappings and face evolving user queries. We thus propose to study LLM routing as a contextual bandit problem, enabling adaptive decision-making using bandit feedback without requiring exhaustive inference across all LLMs for all queries (in contrast to supervised routing). To address this problem, we develop a shared embedding space for queries and LLMs, where query and LLM embeddings are aligned to reflect their affinity. This space is initially learned from offline human preference data and refined through online bandit feedback. We instantiate this idea through Preference-prior Informed Linucb fOr adaptive rouTing (PILOT), a novel extension of LinUCB. To handle diverse user budgets for model routing, we introduce an online cost policy modeled as a multi-choice knapsack problem, ensuring resource-efficient routing.

Related papers

Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs [69.2486294522259]
BaRP is a Bandit Routing-feedback with Preferences approach that trains under the same partial-feedback restriction as deployment.<n> Framed as a contextual bandit over prompt features and a user preference vector, our method simulates an online feedback setting during training and adapts its routing decisions to each new prompt.
arXiv Detail & Related papers (2025-10-08T18:24:59Z)
Online Multi-LLM Selection via Contextual Bandits under Unstructured Context Evolution [31.385024956599676]
Large language models (LLMs) exhibit diverse response behaviors, costs, and strengths.<n>We develop a LinUCB-based algorithm that provably achieves sublinear regret without relying on future context prediction.<n>Our algorithms are theoretically grounded and require no offline fine-tuning or dataset-specific training.
arXiv Detail & Related papers (2025-06-21T10:01:46Z)
RAGRouter: Learning to Route Queries to Multiple Retrieval-Augmented Language Models [45.58601993849455]
Retrieval-Augmented Generation (RAG) significantly improves the performance of Large Language Models (LLMs) on knowledge-intensive tasks.<n>We observe that external documents dynamically affect LLMs' ability to answer queries, while existing routing methods exhibit suboptimal performance in RAG scenarios.<n>We propose RAG, a parametric RAG-aware routing design, which leverages document embeddings and RAG capability embeddings with contrastive learning to capture knowledge representation shifts.
arXiv Detail & Related papers (2025-05-29T03:44:56Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.<n>However, they still struggle with problems requiring multi-step decision-making and environmental feedback.<n>We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression.<n>LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model.<n>Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z)
Universal Model Routing for Efficient LLM Inference [69.86195589350264]
Model routing is a technique for reducing the inference cost of large language models (LLMs)<n>We propose UniRoute, a new approach to the problem of dynamic routing, where new, previously unobserved LLMs are available at test time.<n>We show that these are estimates of a theoretically optimal routing rule, and quantify their errors via an excess risk bound.
arXiv Detail & Related papers (2025-02-12T20:30:28Z)
LLM Bandit: Cost-Efficient LLM Generation via Preference-Conditioned Dynamic Routing [3.090041654375235]
We present a novel framework that formulates the LLM selection process as a multi-armed bandit problem.<n>Our approach incorporates a preference-conditioned dynamic routing mechanism, allowing users to specify their preferences at inference time.<n>Our method achieves significant improvements in both accuracy and cost-effectiveness across various LLM platforms.
arXiv Detail & Related papers (2025-02-04T22:09:43Z)
PickLLM: Context-Aware RL-Assisted Large Language Model Routing [0.5325390073522079]
PickLLM is a lightweight framework that relies on Reinforcement Learning (RL) to route on-the-fly queries to available models.<n>We demonstrate the speed of convergence for different learning rates and improvement in hard metrics such as cost per querying session and overall response latency.
arXiv Detail & Related papers (2024-12-12T06:27:12Z)
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval [55.63711219190506]
Large language models (LLMs) often struggle with posing the right search queries. We introduce $underlineLe$arning to $underlineRe$trieve by $underlineT$rying (LeReT) LeReT can improve the absolute retrieval accuracy by up to 29% and the downstream generator evaluations by 17%.
arXiv Detail & Related papers (2024-10-30T17:02:54Z)
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA) We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity. We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.