Related papers: Budget-Aware Agentic Routing via Boundary-Guided Training

Budget-Aware Agentic Routing via Boundary-Guided Training

URL: http://arxiv.org/abs/2602.21227v1
Date: Wed, 04 Feb 2026 07:39:27 GMT
Title: Budget-Aware Agentic Routing via Boundary-Guided Training
Authors: Caiqi Zhang, Menglin Xia, Xuchao Zhang, Daniel Madrigal, Ankur Mallick, Samuel Kessler, Victor Ruehle, Saravan Rajmohan,
Abstract summary: Budget-Aware Agentic Routing selects between a cheap and an expensive model at each step to optimize the cost-success frontier.<n> Boundary-Guided Training builds a difficulty taxonomy to anchor learning under sparse rewards.<n>Experiment results show that our method improves the efficiency frontier, matching strong routing baselines at substantially lower cost.
Score: 24.0709108941881
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As large language models (LLMs) evolve into autonomous agents that execute long-horizon workflows, invoking a high-capability model at every step becomes economically unsustainable. While model routing is effective for single-turn queries, agentic routing is a sequential, path-dependent problem: early mistakes compound, feedback is often at the end of the episode, and deployments often demand strict per-task spending limits. We propose Budget-Aware Agentic Routing, which selects between a cheap and an expensive model at each step to optimize the cost--success frontier and to operate under strict per-task budgets. We propose Boundary-Guided Training, which leverages two boundary policies (always-small vs.\ always-large) to build a difficulty taxonomy and to anchor learning under sparse rewards. Our approach warms start with boundary-guided SFT data synthesis via stratified sampling of cost-efficient trajectories, then applies Boundary-Guided Policy Optimization (BoPO), combining boundary-relative rewards with a reference-guided advantage to avoid degenerate cheap-failure solutions. Experiment results show that our method improves the efficiency frontier, matching strong routing baselines at substantially lower cost while demonstrating generalization to strict inference-time budget constraints. Overall, our work establishes a foundational framework for agentic routing, shifting the paradigm from static model selection to dynamic, budget-aware sequential decision-making.

Related papers

Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory [56.0946692457838]
BudgetMem is a runtime agent memory framework for explicit, query-aware performance-cost control.<n>A lightweight router performs budget-tier routing across modules to balance task performance and memory construction cost.<n>Across LoCoMo, LongMemEval, and HotpotQA, BudgetMem surpasses strong baselines when performance is prioritized.
arXiv Detail & Related papers (2026-02-05T18:57:09Z)
TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks [26.198066761026297]
Current methods assign entire queries to one model, treating all reasoning as equal to one model.<n>We propose a new model that handles all multi-step reasoning tasks.<n>We develop several strategies within ranging from a simple threshold to more expressive routing policies.
arXiv Detail & Related papers (2026-01-15T10:06:06Z)
EvoRoute: Experience-Driven Self-Routing LLM Agent Systems [100.64399490164959]
EvoRoute is a self-evolving model routing paradigm that transcends static, pre-defined model assignments.<n> Experiments on challenging agentic benchmarks demonstrate that EvoRoute, when integrated into off-the-shelf agentic systems, not only sustains or enhances system performance but also reduces execution cost by up to $80%$ and latency by over $70%$.
arXiv Detail & Related papers (2026-01-06T04:06:46Z)
CONCUR: A Framework for Continual Constrained and Unconstrained Routing [79.85419373937765]
AI tasks differ in complexity and are best addressed with different computation strategies.<n>Most prior methods build the routing framework by training a single model across all strategies.<n>We propose CONCUR, a continual routing framework that supports both constrained and unconstrained routing.
arXiv Detail & Related papers (2025-12-10T07:30:13Z)
xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning [104.63494870852894]
We present x, a tool-calling-based routing system in which a learned router can either answer directly or invoke one or more external models.<n>Our implementation encompasses the full reinforcement learning framework, including reward and cost accounting.<n>Across diverse benchmarks, x achieves strong cost-performance trade-offs.
arXiv Detail & Related papers (2025-10-09T16:52:01Z)
SATER: A Self-Aware and Token-Efficient Approach to Routing and Cascading [39.20076289493037]
We introduce SATER, a dual-mode compatible approach that fine-tunes models through shortest-response preference optimization and a confidence-aware rejection mechanism.<n> SATER significantly reduces redundant outputs and response times, while improving both the performance of pre-generation routing and the efficiency of cascade routing.
arXiv Detail & Related papers (2025-10-04T19:55:36Z)
Keeping Up with the Models: Online Deployment and Routing of LLMs at Scale [6.911384287238722]
We present a hierarchical algorithm that selects up to $M_max$ models for the next stage using reward upper-confidence and cost lower-confidence bounds.<n>We prove that StageRoute achieves a regret of order $T2/3$ and provide a matching lower bound, thereby establishing its near-optimality.
arXiv Detail & Related papers (2025-06-08T12:25:26Z)
Scalable Chain of Thoughts via Elastic Reasoning [61.75753924952059]
Elastic Reasoning is a novel framework for scalable chain of thoughts.<n>It separates reasoning into two phases--thinking and solution--with independently allocated budgets.<n>Our approach produces more concise and efficient reasoning even in unconstrained settings.
arXiv Detail & Related papers (2025-05-08T15:01:06Z)
From Restless to Contextual: A Thresholding Bandit Reformulation For Finite-horizon Performance [8.173852377640964]
We introduce a reformulation of online RBs as a emphbudgeted thresholding contextual bandit.<n>We prove the first non-asymptotic optimality of an oracle policy for a simplified finite-horizon setting.<n>Our work provides a new pathway for achieving practical, sample-efficient learning in finite-horizon RBs.
arXiv Detail & Related papers (2025-02-07T18:23:43Z)
Self-Regulation and Requesting Interventions [63.5863047447313]
We propose an offline framework that trains a "helper" policy to request interventions.<n>We score optimal intervention timing with PRMs and train the helper model on these labeled trajectories.<n>This offline approach significantly reduces costly intervention calls during training.
arXiv Detail & Related papers (2025-02-07T00:06:17Z)
An Efficient Learning-based Solver Comparable to Metaheuristics for the Capacitated Arc Routing Problem [67.92544792239086]
We introduce an NN-based solver to significantly narrow the gap with advanced metaheuristics. First, we propose direction-aware facilitating attention model (DaAM) to incorporate directionality into the embedding process. Second, we design a supervised reinforcement learning scheme that involves supervised pre-training to establish a robust initial policy.
arXiv Detail & Related papers (2024-03-11T02:17:42Z)
Imitate the Good and Avoid the Bad: An Incremental Approach to Safe Reinforcement Learning [11.666700714916065]
Constrained RL is a framework for enforcing safe actions in Reinforcement Learning. Most recent approaches for solving Constrained RL convert the trajectory based cost constraint into a surrogate problem. We present an approach that does not modify the trajectory based cost constraint and instead imitates good'' trajectories.
arXiv Detail & Related papers (2023-12-16T08:48:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.