Related papers: Budget-Aware Tool-Use Enables Effective Agent Scaling

Budget-Aware Tool-Use Enables Effective Agent Scaling

URL: http://arxiv.org/abs/2511.17006v1
Date: Fri, 21 Nov 2025 07:18:55 GMT
Title: Budget-Aware Tool-Use Enables Effective Agent Scaling
Authors: Tengxiao Liu, Zifeng Wang, Jin Miao, I-Hung Hsu, Jun Yan, Jiefeng Chen, Rujun Han, Fangyuan Xu, Yanfei Chen, Ke Jiang, Samira Daruki, Yi Liang, William Yang Wang, Tomas Pfister, Chen-Yu Lee,
Abstract summary: Scaling test-time computation improves performance across different tasks on large language models (LLMs)<n>We study how to scale such agents effectively under explicit tool-call budgets, focusing on web search agents.<n>We introduce the Budget Tracker, a lightweight plug-in that provides the agent with continuous budget awareness.
Score: 82.6942342482552
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scaling test-time computation improves performance across different tasks on large language models (LLMs), which has also been extended to tool-augmented agents. For these agents, scaling involves not only "thinking" in tokens but also "acting" via tool calls. The number of tool calls directly bounds the agent's interaction with the external environment. However, we find that simply granting agents a larger tool-call budget fails to improve performance, as they lack "budget awareness" and quickly hit a performance ceiling. To address this, we study how to scale such agents effectively under explicit tool-call budgets, focusing on web search agents. We first introduce the Budget Tracker, a lightweight plug-in that provides the agent with continuous budget awareness, enabling simple yet effective scaling. We further develop BATS (Budget Aware Test-time Scaling), an advanced framework that leverages this awareness to dynamically adapt its planning and verification strategy, deciding whether to "dig deeper" on a promising lead or "pivot" to new paths based on remaining resources. To analyze cost-performance scaling in a controlled manner, we formalize a unified cost metric that jointly accounts for token and tool consumption. We provide the first systematic study on budget-constrained agents, showing that budget-aware methods produce more favorable scaling curves and push the cost-performance Pareto frontier. Our work offers empirical insights toward a more transparent and principled understanding of scaling in tool-augmented agents.

Related papers

Budget-Constrained Agentic Large Language Models: Intention-Based Planning for Costly Tool Use [20.31276001607449]
We study budget-constrained tool-augmented agents, where a large language model must solve multi-step tasks by invoking external tools under a strict monetary budget.<n>We propose INTENT, an inference-time planning framework that leverages an intention-aware hierarchical world model to anticipate future tool usage, risk-calibrated cost, and guide decisions online.
arXiv Detail & Related papers (2026-02-12T04:01:30Z)
Active Learning Using Aggregated Acquisition Functions: Accuracy and Sustainability Analysis [14.398823059302279]
Active learning (AL) is a machine learning approach that strategically selects the most informative samples for annotation during training.<n>This strategy not only reduces labeling expenses but also results in energy savings during neural network training.<n>We implement and evaluate various state-of-the-art acquisition functions, analyzing their accuracy and computational costs.
arXiv Detail & Related papers (2026-02-07T08:42:12Z)
Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory [56.0946692457838]
BudgetMem is a runtime agent memory framework for explicit, query-aware performance-cost control.<n>A lightweight router performs budget-tier routing across modules to balance task performance and memory construction cost.<n>Across LoCoMo, LongMemEval, and HotpotQA, BudgetMem surpasses strong baselines when performance is prioritized.
arXiv Detail & Related papers (2026-02-05T18:57:09Z)
AutoTool: Efficient Tool Selection for Large Language Model Agents [10.061664247482488]
Large Language Model (LLM) agents have emerged as powerful tools for automating complex tasks by leveraging the reasoning and decision-making abilities of LLMs.<n>However, a major bottleneck lies in the high inference cost of tool selection, especially in approaches like ReAct that repeatedly invoke the LLM to determine which tool to use at each step.<n>We propose AutoTool, a novel graph-based framework that bypasses repeated LLM inference by exploiting a key empirical observation: tool usage inertia.
arXiv Detail & Related papers (2025-11-18T16:41:48Z)
JSPLIT: A Taxonomy-based Solution for Prompt Bloating in Model Context Protocol [1.2166472806042592]
We describe the design of the taxonomy, the tool selection algorithm, and a dataset used to evaluateLIT.<n>We show thatLIT significantly reduces prompt size without significantly compromising the agent's ability to respond effectively.
arXiv Detail & Related papers (2025-10-16T10:28:23Z)
The Cost of Dynamic Reasoning: Demystifying AI Agents and Test-Time Scaling from an AI Infrastructure Perspective [3.0868637098088403]
Large-language-model (LLM)-based AI agents have recently showcased impressive versatility by employing dynamic reasoning.<n>This paper presents the first comprehensive system-level analysis of AI agents, quantifying their resource usage, latency behavior, energy consumption, and test-time scaling strategies.<n>Our findings reveal that while agents improve accuracy with increased compute, they suffer from rapidly diminishing returns, widening latency variance, and unsustainable infrastructure costs.
arXiv Detail & Related papers (2025-06-04T14:37:54Z)
Acting Less is Reasoning More! Teaching Model to Act Efficiently [87.28134636548705]
Tool-integrated reasoning augments large language models with the ability to invoke external tools to solve tasks.<n>Current approaches typically optimize only for final correctness without considering the efficiency or necessity of external tool use.<n>We propose a framework that encourages models to produce accurate answers with minimal tool calls.<n>Our approach reduces tool calls by up to 68.3% and improves tool productivity by up to 215.4%, while maintaining comparable answer accuracy.
arXiv Detail & Related papers (2025-04-21T05:40:05Z)
DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [55.13854171147104]
Large Language Models (LLMs) have revolutionized various domains, including natural language processing, data analysis, and software development.<n>We present Dynamic Action Re-Sampling (DARS), a novel inference time compute scaling approach for coding agents.<n>We evaluate our approach on SWE-Bench Lite benchmark, demonstrating that this scaling strategy achieves a pass@k score of 55% with Claude 3.5 Sonnet V2.
arXiv Detail & Related papers (2025-03-18T14:02:59Z)
Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use.<n>MeCo quantifies metacognitive scores by capturing high-level cognitive signals in the representation space.<n>MeCo is fine-tuning-free and incurs minimal cost.
arXiv Detail & Related papers (2025-02-18T15:45:01Z)
SMART: Self-Aware Agent for Tool Overuse Mitigation [58.748554080273585]
Current Large Language Model (LLM) agents demonstrate strong reasoning and tool use capabilities, but often lack self-awareness.<n>This imbalance leads to Tool Overuse, where models unnecessarily rely on external tools for tasks with parametric knowledge.<n>We introduce SMART (Strategic Model-Aware Reasoning with Tools), a paradigm that enhances an agent's self-awareness to optimize task handling and reduce tool overuse.
arXiv Detail & Related papers (2025-02-17T04:50:37Z)
Unpacking the Black Box: Regulating Algorithmic Decisions [1.283555556182245]
We propose a model of oversight over 'black-box' algorithms used in high-stakes applications such as lending, medical testing, or hiring. We show that allowing for complex algorithms can improve welfare, but the gains depend on how the regulator regulates them.
arXiv Detail & Related papers (2021-10-05T23:20:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.