EvoRoute: Experience-Driven Self-Routing LLM Agent Systems
- URL: http://arxiv.org/abs/2601.02695v1
- Date: Tue, 06 Jan 2026 04:06:46 GMT
- Title: EvoRoute: Experience-Driven Self-Routing LLM Agent Systems
- Authors: Guibin Zhang, Haiyang Yu, Kaiming Yang, Bingli Wu, Fei Huang, Yongbin Li, Shuicheng Yan,
- Abstract summary: EvoRoute is a self-evolving model routing paradigm that transcends static, pre-defined model assignments.<n> Experiments on challenging agentic benchmarks demonstrate that EvoRoute, when integrated into off-the-shelf agentic systems, not only sustains or enhances system performance but also reduces execution cost by up to $80%$ and latency by over $70%$.
- Score: 100.64399490164959
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Complex agentic AI systems, powered by a coordinated ensemble of Large Language Models (LLMs), tool and memory modules, have demonstrated remarkable capabilities on intricate, multi-turn tasks. However, this success is shadowed by prohibitive economic costs and severe latency, exposing a critical, yet underexplored, trade-off. We formalize this challenge as the \textbf{Agent System Trilemma}: the inherent tension among achieving state-of-the-art performance, minimizing monetary cost, and ensuring rapid task completion. To dismantle this trilemma, we introduce EvoRoute, a self-evolving model routing paradigm that transcends static, pre-defined model assignments. Leveraging an ever-expanding knowledge base of prior experience, EvoRoute dynamically selects Pareto-optimal LLM backbones at each step, balancing accuracy, efficiency, and resource use, while continually refining its own selection policy through environment feedback. Experiments on challenging agentic benchmarks such as GAIA and BrowseComp+ demonstrate that EvoRoute, when integrated into off-the-shelf agentic systems, not only sustains or enhances system performance but also reduces execution cost by up to $80\%$ and latency by over $70\%$.
Related papers
- M$^2$: Dual-Memory Augmentation for Long-Horizon Web Agents via Trajectory Summarization and Insight Retrieval [64.06936170117943]
M$2$ is a training-free, memory-augmented framework designed to optimize context efficiency and decision-making.<n>Our approach incorporates a dual-tier memory mechanism that synergizes Dynamic Trajectory Summarization (Internal Memory) to compress verbose interaction history into concise state updates, and Insight Retrieval Augmentation (External Memory) to guide the agent with actionable guidelines retrieved from an offline insight bank.
arXiv Detail & Related papers (2026-02-28T06:59:51Z) - CASTER: Breaking the Cost-Performance Barrier in Multi-Agent Orchestration via Context-Aware Strategy for Task Efficient Routing [25.48759875572515]
CASTER (Context-Aware Strategy for Task Efficient Routing) is a lightweight router for dynamic model selection in graph-based MAS.<n>CASTER reduces inference cost by up to 72.4% compared to strong-model baselines.
arXiv Detail & Related papers (2026-01-27T16:52:47Z) - Towards Resource-Efficient Multimodal Intelligence: Learned Routing among Specialized Expert Models [0.0]
Large language models (LLMs) increasingly power vision, audio, and document understanding.<n>Small open-source models offer cost advantages but struggle with complex or multimodal queries.<n>We introduce a unified, modular framework that intelligently routes each query to the most fitting expert model.
arXiv Detail & Related papers (2025-11-09T16:14:56Z) - Don't Just Fine-tune the Agent, Tune the Environment [25.7349297100143]
Supervised fine-tuning on synthetic data leads to overfitting.<n>Standard reinforcement learning struggles with a critical cold-start problem and training instability.<n>Our work presents a paradigm shift from supervised fine-tuning on static trajectories to dynamic, environment-based exploration.
arXiv Detail & Related papers (2025-10-11T12:35:15Z) - xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning [104.63494870852894]
We present x, a tool-calling-based routing system in which a learned router can either answer directly or invoke one or more external models.<n>Our implementation encompasses the full reinforcement learning framework, including reward and cost accounting.<n>Across diverse benchmarks, x achieves strong cost-performance trade-offs.
arXiv Detail & Related papers (2025-10-09T16:52:01Z) - SATER: A Self-Aware and Token-Efficient Approach to Routing and Cascading [39.20076289493037]
We introduce SATER, a dual-mode compatible approach that fine-tunes models through shortest-response preference optimization and a confidence-aware rejection mechanism.<n> SATER significantly reduces redundant outputs and response times, while improving both the performance of pre-generation routing and the efficiency of cascade routing.
arXiv Detail & Related papers (2025-10-04T19:55:36Z) - MAS$^2$: Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems [40.44248136759827]
We introduce MAS$2$, a multi-agent system that autonomously architects bespoke multi-agent systems.<n> MAS$2$ achieves performance gains of up to $19.6%$ over state-of-the-art MAS.
arXiv Detail & Related papers (2025-09-29T06:20:10Z) - Dynamic Speculative Agent Planning [57.630218933994534]
Large language-model-based agents face critical deployment challenges due to prohibitive latency and inference costs.<n>We introduce Dynamic Speculative Planning (DSP), an online reinforcement learning framework that provides lossless acceleration with substantially reduced costs.<n>Experiments on two standard agent benchmarks demonstrate that DSP achieves comparable efficiency to the fastest acceleration method while reducing total cost by 30% and unnecessary cost up to 60%.
arXiv Detail & Related papers (2025-09-02T03:34:36Z) - Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.<n>However, they still struggle with problems requiring multi-step decision-making and environmental feedback.<n>We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - MixLLM: Dynamic Routing in Mixed Large Language Models [57.309520357563215]
Large Language Models (LLMs) exhibit potential artificial generic intelligence recently, however, their usage is costly with high response latency.<n>We develop MixLLM, a dynamic contextual-bandit-based routing system for query-LLM assignment.
arXiv Detail & Related papers (2025-02-09T02:26:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.