Dynamic System Instructions and Tool Exposure for Efficient Agentic LLMs
- URL: http://arxiv.org/abs/2602.17046v1
- Date: Mon, 01 Dec 2025 06:43:43 GMT
- Title: Dynamic System Instructions and Tool Exposure for Efficient Agentic LLMs
- Authors: Uria Franko,
- Abstract summary: Large Language Model (LLM) agents often run for many steps while re-ingesting long system instructions and large tool catalogs each turn.<n>We propose Instruction-Tool Retrieval (ITR), a RAG variant that retrieves, per step, only the minimal system-prompt fragments and the smallest necessary subset of tools.<n>ITR reduces per-step context tokens by 95%, improves correct tool routing by 32% relative, and cuts end-to-end episode cost by 70% versus a monolithic baseline.
- Score: 1.2691047660244335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Model (LLM) agents often run for many steps while re-ingesting long system instructions and large tool catalogs each turn. This increases cost, agent derailment probability, latency, and tool-selection errors. We propose Instruction-Tool Retrieval (ITR), a RAG variant that retrieves, per step, only the minimal system-prompt fragments and the smallest necessary subset of tools. ITR composes a dynamic runtime system prompt and exposes a narrowed toolset with confidence-gated fallbacks. Using a controlled benchmark with internally consistent numbers, ITR reduces per-step context tokens by 95%, improves correct tool routing by 32% relative, and cuts end-to-end episode cost by 70% versus a monolithic baseline. These savings enable agents to run 2-20x more loops within context limits. Savings compound with the number of agent steps, making ITR particularly valuable for long-running autonomous agents. We detail the method, evaluation protocol, ablations, and operational guidance for practical deployment.
Related papers
- Graph-Based Self-Healing Tool Routing for Cost-Efficient LLM Agents [0.0]
Self-Healing Router is a fault-tolerant orchestration architecture.<n>It treats most agent control-flow decisions as routing rather than reasoning.<n>Every failure is either a logged reroute or an explicit escalation, never a silent skip.
arXiv Detail & Related papers (2026-03-02T07:21:15Z) - Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents [35.76774274440008]
STING (Sequential Testing of Illicit N-step Goal execution) is an automated red-teaming framework.<n>It constructs a step-by-step illicit plan grounded in a benign persona and iteratively probes a target agent with adaptive follow-ups.<n>We introduce an analysis framework that models multi-turn red-teaming as a time-to-first-jailbreak random variable.
arXiv Detail & Related papers (2026-02-18T10:31:19Z) - DLLM Agent: See Farther, Run Faster [94.74432470237817]
Diffusion large language models (DLLMs) have emerged as an alternative to autoregressive (AR) decoding with appealing efficiency and modeling properties.<n>We study this in a controlled setting by instantiatingDLLM and AR backbones within the same agent workflow.<n>We find thatDLLM Agents are on average over 30% faster end to end than AR agents, with some cases exceeding 8x speedup.
arXiv Detail & Related papers (2026-02-07T09:01:18Z) - Optimizing Agentic Workflows using Meta-tools [3.3298825663516403]
Agentic AI enables LLM to dynamically reason, plan, and interact with tools to solve complex tasks.<n>This work introduces Agent Optimization (AWO), a framework that identifies and optimize redundant tool execution patterns.<n>AWO reduces the number of LLM calls up to 11.9% while also increasing the task success rate by up to 4.2 percent points.
arXiv Detail & Related papers (2026-01-29T17:43:08Z) - AutoTool: Efficient Tool Selection for Large Language Model Agents [10.061664247482488]
Large Language Model (LLM) agents have emerged as powerful tools for automating complex tasks by leveraging the reasoning and decision-making abilities of LLMs.<n>However, a major bottleneck lies in the high inference cost of tool selection, especially in approaches like ReAct that repeatedly invoke the LLM to determine which tool to use at each step.<n>We propose AutoTool, a novel graph-based framework that bypasses repeated LLM inference by exploiting a key empirical observation: tool usage inertia.
arXiv Detail & Related papers (2025-11-18T16:41:48Z) - Multi-Agent Tool-Integrated Policy Optimization [67.12841355267678]
Large language models (LLMs) increasingly rely on multi-turn tool-integrated planning for knowledge-intensive and complex reasoning tasks.<n>Existing implementations typically rely on a single agent, but they suffer from limited context length and noisy tool responses.<n>No existing methods support effective reinforcement learning post-training of tool-integrated multi-agent frameworks.
arXiv Detail & Related papers (2025-10-06T10:44:04Z) - RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory [57.449129198822476]
RCR is a role-aware context routing framework for multi-agent large language model (LLM) systems.<n>It dynamically selects semantically relevant memory subsets for each agent based on its role and task stage.<n>A lightweight scoring policy guides memory selection, and agent outputs are integrated into a shared memory store.
arXiv Detail & Related papers (2025-08-06T21:59:34Z) - Runaway is Ashamed, But Helpful: On the Early-Exit Behavior of Large Language Model-based Agents in Embodied Environments [54.67512489842682]
Large language models (LLMs) have demonstrated strong planning and decision-making capabilities in complex embodied environments.<n>We take a first step toward exploring the early-exit behavior for LLM-based agents.
arXiv Detail & Related papers (2025-05-23T08:23:36Z) - Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning [68.00304954972232]
Multimodal agents, which integrate a controller e.g., a vision language model, with external tools, have demonstrated remarkable capabilities in tackling complex multimodal tasks.<n>Existing approaches for training these agents depend on extensive human-annotated task-answer pairs and tool trajectories.<n>We propose an iterative tool usage exploration method for multimodal agents without any pre-collected data, namely SPORT.<n>SPORT has four iterative components: task synthesis, step sampling, step verification, and preference tuning.
arXiv Detail & Related papers (2025-04-30T12:01:27Z) - RL-GPT: Integrating Reinforcement Learning and Code-as-policy [82.1804241891039]
We introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent.
The slow agent analyzes actions suitable for coding, while the fast agent executes coding tasks.
This decomposition effectively focuses each agent on specific tasks, proving highly efficient within our pipeline.
arXiv Detail & Related papers (2024-02-29T16:07:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.