Related papers: NaviAgent: Bilevel Planning on Tool Navigation Graph for Large-Scale Orchestration

NaviAgent: Bilevel Planning on Tool Navigation Graph for Large-Scale Orchestration

URL: http://arxiv.org/abs/2506.19500v2
Date: Fri, 31 Oct 2025 14:24:22 GMT
Title: NaviAgent: Bilevel Planning on Tool Navigation Graph for Large-Scale Orchestration
Authors: Yan Jiang, Hao Zhou, LiZhong GU, Ai Han, TianLong Li,
Abstract summary: Large language models (LLMs) have recently demonstrated the ability to act as function call agents by invoking external tools.<n>We propose NaviAgent, which decouples task planning from tool execution through graph-based modeling of the tool ecosystem.<n> Experiments show that NaviAgent achieves the best task success rates across models and tasks, and integrating TWMN further boosts performance by up to 17 points on complex tasks.
Score: 13.925896302382043
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have recently demonstrated the ability to act as function call agents by invoking external tools, enabling them to solve tasks beyond their static knowledge. However, existing agents typically call tools step by step at a time without a global view of task structure. As tools depend on each other, this leads to error accumulation and limited scalability, particularly when scaling to thousands of tools. To address these limitations, we propose NaviAgent, a novel bilevel architecture that decouples task planning from tool execution through graph-based modeling of the tool ecosystem. At the task-planning level, the LLM-based agent decides whether to respond directly, clarify user intent, invoke a toolchain, or execute tool outputs, ensuring broad coverage of interaction scenarios independent of inter-tool complexity. At the execution level, a continuously evolving Tool World Navigation Model (TWNM) encodes structural and behavioral relations among tools, guiding the agent to generate scalable and robust invocation sequences. By incorporating feedback from real tool interactions, NaviAgent supports closed-loop optimization of planning and execution, moving beyond tool calling toward adaptive navigation of large-scale tool ecosystems. Experiments show that NaviAgent achieves the best task success rates across models and tasks, and integrating TWMN further boosts performance by up to 17 points on complex tasks, underscoring its key role in toolchain orchestration.

Related papers

Learning to Rewrite Tool Descriptions for Reliable LLM-Agent Tool Use [21.666294374943178]
We propose a curriculum learning framework that transfers supervision from trace-rich settings to trace-free deployment.<n> Experiments show consistent gains on unseen tools, strong cross-domain generalization, and robustness as the number of candidate tools scales to over 100.
arXiv Detail & Related papers (2026-02-23T23:50:24Z)
ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Intrinsic Adaptation [60.25542764389203]
Agentic systems powered by Large Language Models (LLMs) have demonstrated remarkable potential in tackling complex, long-horizon tasks.<n>Existing approaches, relying on manual orchestration or runtime-based patches, often struggle with poor generalization and fragmented optimization.<n>We propose ToolSelf, a novel paradigm enabling tool-driven self-readjustment.
arXiv Detail & Related papers (2026-02-08T09:27:18Z)
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning [66.24374176797075]
We introduce textbfAdaReasoner, a family of multimodal models that learn tool use as a general reasoning skill rather than as tool-specific or explicitly supervised behavior.<n>AdaReasoner is enabled by (i) a scalable data curation pipeline exposing models to long-horizon, multi-step tool interactions; (ii) Tool-GRPO, a reinforcement learning algorithm that prioritizes tool selection and sequencing based on end-task success; and (iii) an adaptive learning mechanism that dynamically regulates tool usage.
arXiv Detail & Related papers (2026-01-26T16:04:43Z)
AutoTool: Efficient Tool Selection for Large Language Model Agents [10.061664247482488]
Large Language Model (LLM) agents have emerged as powerful tools for automating complex tasks by leveraging the reasoning and decision-making abilities of LLMs.<n>However, a major bottleneck lies in the high inference cost of tool selection, especially in approaches like ReAct that repeatedly invoke the LLM to determine which tool to use at each step.<n>We propose AutoTool, a novel graph-based framework that bypasses repeated LLM inference by exploiting a key empirical observation: tool usage inertia.
arXiv Detail & Related papers (2025-11-18T16:41:48Z)
ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use [64.20714385692634]
ToolScope is an agentic framework designed to unify global planning with local multimodal perception.<n>We evaluate ToolScope on four VQA benchmarks across diverse domains, including VQA 2.0, ScienceQA, MAT-Search and MathVista.
arXiv Detail & Related papers (2025-10-31T10:51:27Z)
DeepAgent: A General Reasoning Agent with Scalable Toolsets [111.6384541877723]
DeepAgent is an end-to-end deep reasoning agent that performs autonomous thinking, tool discovery, and action execution.<n>To address the challenges of long-horizon interactions, we introduce an autonomous memory folding mechanism that compresses past interactions into structured episodic, working, and tool memories.<n>We develop an end-to-end reinforcement learning strategy, namely ToolPO, that leverages LLM-simulated APIs and applies tool-call advantage attribution to assign fine-grained credit to the tool invocation tokens.
arXiv Detail & Related papers (2025-10-24T16:24:01Z)
ToolLibGen: Scalable Automatic Tool Creation and Aggregation for LLM Reasoning [80.10274552177096]
Large Language Models (LLMs) equipped with external tools have demonstrated enhanced performance on complex reasoning tasks.<n>The widespread adoption of this tool-augmented reasoning is hindered by the scarcity of domain-specific tools.<n>We propose a systematic approach to automatically an unstructured collection of tools into a structured tool library.
arXiv Detail & Related papers (2025-10-09T04:11:16Z)
AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search [58.98450205734779]
Large language model (LLM) agents have demonstrated strong capabilities across diverse domains.<n>Existing agent search methods suffer from three major limitations.<n>We introduce a comprehensive framework to address these challenges.
arXiv Detail & Related papers (2025-06-06T12:07:23Z)
Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents [31.651748374218446]
Large language models (LLMs) have recently achieved remarkable results in complex multi-step tasks.<n>They often struggle to maintain consistent performance across multiple solution attempts.
arXiv Detail & Related papers (2025-05-19T18:50:15Z)
Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning [69.32855772335624]
Multimodal agents, which integrate a controller e.g., a vision language model, with external tools, have demonstrated remarkable capabilities in tackling complex multimodal tasks.<n>Existing approaches for training these agents depend on extensive human-annotated task-answer pairs and tool trajectories.<n>We propose an iterative tool usage exploration method for multimodal agents without any pre-collected data, namely SPORT.<n>SPORT has four iterative components: task synthesis, step sampling, step verification, and preference tuning.
arXiv Detail & Related papers (2025-04-30T12:01:27Z)
GeoBenchX: Benchmarking LLMs for Multistep Geospatial Tasks [0.0]
Sonnet 3.5 and GPT-4o achieve the best overall performance, with Claude models excelling on solvable tasks.<n>Common errors include misunderstanding geometrical relationships, relying on outdated knowledge, and inefficient data manipulation.
arXiv Detail & Related papers (2025-03-23T16:20:14Z)
PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation [68.17081518640934]
We propose a PrIrmitive-driVen waypOinT-aware world model for Robotic manipulation (PIVOT-R) PIVOT-R consists of a Waypoint-aware World Model (WAWM) and a lightweight action prediction module. Our PIVOT-R outperforms state-of-the-art open-source models on the SeaWave benchmark, achieving an average relative improvement of 19.45% across four levels of instruction tasks.
arXiv Detail & Related papers (2024-10-14T11:30:18Z)
MetaTool: Facilitating Large Language Models to Master Tools with Meta-task Augmentation [25.360660222418183]
We present MetaTool, a novel tool learning methodology designed to generalize across any reusable toolset. By incorporating meta-task data into task-oriented training, our method significantly enhances the performance of open-source Large Language Models.
arXiv Detail & Related papers (2024-07-15T10:15:41Z)
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets [99.8988504388011]
APIGen is an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets. We release a dataset containing 60,000 high-quality entries, aiming to advance the field of function-calling agent domains.
arXiv Detail & Related papers (2024-06-26T17:49:11Z)
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models [58.08177466768262]
Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks. We introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore this graph autonomously. Experimental results on the LV-Eval dataset reveal that GraphReader, using a 4k context window, consistently outperforms GPT-4-128k across context lengths from 16k to 256k by a large margin.
arXiv Detail & Related papers (2024-06-20T17:57:51Z)
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration [52.25473993987409]
We propose Mobile-Agent-v2, a multi-agent architecture for mobile device operation assistance. The architecture comprises three agents: planning agent, decision agent, and reflection agent. We show that Mobile-Agent-v2 achieves over a 30% improvement in task completion compared to the single-agent architecture.
arXiv Detail & Related papers (2024-06-03T05:50:00Z)
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search [36.142986105945894]
Large language models (LLMs) have demonstrated powerful decision-making and planning capabilities. We propose ToolChain*, an efficient tree search-based planning algorithm for LLM-based agents. It formulates the entire action space as a decision tree, where each node represents a possible API function call involved in a solution plan.
arXiv Detail & Related papers (2023-10-20T02:24:35Z)
Learning Graph-Enhanced Commander-Executor for Multi-Agent Navigation [28.71585436726336]
Multi-agent reinforcement learning (MARL) has shown promising results for solving this issue. Goal-conditioned hierarchical reinforcement learning (HRL) provides a promising direction to tackle this challenge. We propose MAGE-X, a graph-based goal-conditioned hierarchical method for multi-agent navigation tasks.
arXiv Detail & Related papers (2023-02-08T14:44:21Z)
Constructing Stronger and Faster Baselines for Skeleton-based Action Recognition [19.905455701387194]
We present an efficient Graph Convolutional Network (GCN) baseline for skeleton-based action recognition. On two large-scale datasets, i.e., NTU RGB+D 60 and 120, the proposed EfficientGCN-B4 baseline outperforms other State-Of-The-Art (SOTA) methods.
arXiv Detail & Related papers (2021-06-29T07:09:11Z)
Learning to Generate Content-Aware Dynamic Detectors [62.74209921174237]
We introduce a newpective of designing efficient detectors, which is automatically generating sample-adaptive model architecture. We introduce a course-to-fine strat-egy tailored for object detection to guide the learning of dynamic routing. Experiments on MS-COCO dataset demonstrate that CADDet achieves 1.8 higher mAP with 10% fewer FLOPs compared with vanilla routing.
arXiv Detail & Related papers (2020-12-08T08:05:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.