Budget-Aware Agentic Routing via Boundary-Guided Training
- URL: http://arxiv.org/abs/2602.21227v1
- Date: Wed, 04 Feb 2026 07:39:27 GMT
- Title: Budget-Aware Agentic Routing via Boundary-Guided Training
- Authors: Caiqi Zhang, Menglin Xia, Xuchao Zhang, Daniel Madrigal, Ankur Mallick, Samuel Kessler, Victor Ruehle, Saravan Rajmohan,
- Abstract summary: Budget-Aware Agentic Routing selects between a cheap and an expensive model at each step to optimize the cost-success frontier.<n> Boundary-Guided Training builds a difficulty taxonomy to anchor learning under sparse rewards.<n>Experiment results show that our method improves the efficiency frontier, matching strong routing baselines at substantially lower cost.
- Score: 24.0709108941881
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As large language models (LLMs) evolve into autonomous agents that execute long-horizon workflows, invoking a high-capability model at every step becomes economically unsustainable. While model routing is effective for single-turn queries, agentic routing is a sequential, path-dependent problem: early mistakes compound, feedback is often at the end of the episode, and deployments often demand strict per-task spending limits. We propose Budget-Aware Agentic Routing, which selects between a cheap and an expensive model at each step to optimize the cost--success frontier and to operate under strict per-task budgets. We propose Boundary-Guided Training, which leverages two boundary policies (always-small vs.\ always-large) to build a difficulty taxonomy and to anchor learning under sparse rewards. Our approach warms start with boundary-guided SFT data synthesis via stratified sampling of cost-efficient trajectories, then applies Boundary-Guided Policy Optimization (BoPO), combining boundary-relative rewards with a reference-guided advantage to avoid degenerate cheap-failure solutions. Experiment results show that our method improves the efficiency frontier, matching strong routing baselines at substantially lower cost while demonstrating generalization to strict inference-time budget constraints. Overall, our work establishes a foundational framework for agentic routing, shifting the paradigm from static model selection to dynamic, budget-aware sequential decision-making.
Related papers
- Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory [56.0946692457838]
BudgetMem is a runtime agent memory framework for explicit, query-aware performance-cost control.<n>A lightweight router performs budget-tier routing across modules to balance task performance and memory construction cost.<n>Across LoCoMo, LongMemEval, and HotpotQA, BudgetMem surpasses strong baselines when performance is prioritized.
arXiv Detail & Related papers (2026-02-05T18:57:09Z) - TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks [26.198066761026297]
Current methods assign entire queries to one model, treating all reasoning as equal to one model.<n>We propose a new model that handles all multi-step reasoning tasks.<n>We develop several strategies within ranging from a simple threshold to more expressive routing policies.
arXiv Detail & Related papers (2026-01-15T10:06:06Z) - EvoRoute: Experience-Driven Self-Routing LLM Agent Systems [100.64399490164959]
EvoRoute is a self-evolving model routing paradigm that transcends static, pre-defined model assignments.<n> Experiments on challenging agentic benchmarks demonstrate that EvoRoute, when integrated into off-the-shelf agentic systems, not only sustains or enhances system performance but also reduces execution cost by up to $80%$ and latency by over $70%$.
arXiv Detail & Related papers (2026-01-06T04:06:46Z) - CONCUR: A Framework for Continual Constrained and Unconstrained Routing [79.85419373937765]
AI tasks differ in complexity and are best addressed with different computation strategies.<n>Most prior methods build the routing framework by training a single model across all strategies.<n>We propose CONCUR, a continual routing framework that supports both constrained and unconstrained routing.
arXiv Detail & Related papers (2025-12-10T07:30:13Z) - xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning [104.63494870852894]
We present x, a tool-calling-based routing system in which a learned router can either answer directly or invoke one or more external models.<n>Our implementation encompasses the full reinforcement learning framework, including reward and cost accounting.<n>Across diverse benchmarks, x achieves strong cost-performance trade-offs.
arXiv Detail & Related papers (2025-10-09T16:52:01Z) - SATER: A Self-Aware and Token-Efficient Approach to Routing and Cascading [39.20076289493037]
We introduce SATER, a dual-mode compatible approach that fine-tunes models through shortest-response preference optimization and a confidence-aware rejection mechanism.<n> SATER significantly reduces redundant outputs and response times, while improving both the performance of pre-generation routing and the efficiency of cascade routing.
arXiv Detail & Related papers (2025-10-04T19:55:36Z) - Keeping Up with the Models: Online Deployment and Routing of LLMs at Scale [6.911384287238722]
We present a hierarchical algorithm that selects up to $M_max$ models for the next stage using reward upper-confidence and cost lower-confidence bounds.<n>We prove that StageRoute achieves a regret of order $T2/3$ and provide a matching lower bound, thereby establishing its near-optimality.
arXiv Detail & Related papers (2025-06-08T12:25:26Z) - Scalable Chain of Thoughts via Elastic Reasoning [61.75753924952059]
Elastic Reasoning is a novel framework for scalable chain of thoughts.<n>It separates reasoning into two phases--thinking and solution--with independently allocated budgets.<n>Our approach produces more concise and efficient reasoning even in unconstrained settings.
arXiv Detail & Related papers (2025-05-08T15:01:06Z) - From Restless to Contextual: A Thresholding Bandit Reformulation For Finite-horizon Performance [8.173852377640964]
We introduce a reformulation of online RBs as a emphbudgeted thresholding contextual bandit.<n>We prove the first non-asymptotic optimality of an oracle policy for a simplified finite-horizon setting.<n>Our work provides a new pathway for achieving practical, sample-efficient learning in finite-horizon RBs.
arXiv Detail & Related papers (2025-02-07T18:23:43Z) - Self-Regulation and Requesting Interventions [63.5863047447313]
We propose an offline framework that trains a "helper" policy to request interventions.<n>We score optimal intervention timing with PRMs and train the helper model on these labeled trajectories.<n>This offline approach significantly reduces costly intervention calls during training.
arXiv Detail & Related papers (2025-02-07T00:06:17Z) - An Efficient Learning-based Solver Comparable to Metaheuristics for the
Capacitated Arc Routing Problem [67.92544792239086]
We introduce an NN-based solver to significantly narrow the gap with advanced metaheuristics.
First, we propose direction-aware facilitating attention model (DaAM) to incorporate directionality into the embedding process.
Second, we design a supervised reinforcement learning scheme that involves supervised pre-training to establish a robust initial policy.
arXiv Detail & Related papers (2024-03-11T02:17:42Z) - Imitate the Good and Avoid the Bad: An Incremental Approach to Safe Reinforcement Learning [11.666700714916065]
Constrained RL is a framework for enforcing safe actions in Reinforcement Learning.
Most recent approaches for solving Constrained RL convert the trajectory based cost constraint into a surrogate problem.
We present an approach that does not modify the trajectory based cost constraint and instead imitates good'' trajectories.
arXiv Detail & Related papers (2023-12-16T08:48:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.