Related papers: ProTIP: Progressive Tool Retrieval Improves Planning

Related papers

Tool-Aware Planning in Contact Center AI: Evaluating LLMs through Lineage-Guided Query Decomposition [2.8180871881371456]
We present a domain-grounded framework and benchmark for tool-aware plan generation in contact centers.<n>Our contributions are threefold: (i) a reference-based plan evaluation framework operating in two modes - a metric-wise evaluator and a one-shot evaluator, and (ii) a data methodology that iteratively refines plans via an evaluator->optimizer loop.
arXiv Detail & Related papers (2026-02-16T17:36:05Z)
Dynamic Tool Dependency Retrieval for Efficient Function Calling [38.77768293858919]
We propose Dynamic Tool Dependency Retrieval (DTDR), a lightweight retrieval method that conditions on both the initial query and the evolving execution context.<n>We benchmark DTDR against state-of-the-art retrieval methods across multiple datasets and Large Language Models backbones.<n>Our results show that dynamic tool retrieval improves function calling success rates between $23%$ and $104%$ compared to state-of-the-art static retrievers.
arXiv Detail & Related papers (2025-12-18T20:40:25Z)
MetaTPT: Meta Test-time Prompt Tuning for Vision-Language Models [62.20230218401528]
We propose Meta Test-Time Prompt Tuning (MetaTPT), a meta-learning framework that learns a self-supervised auxiliary task to guide test-time prompt tuning.<n>By coupling augmentation learning with prompt tuning, MetaTPT improves test-time adaptation under domain shifts.
arXiv Detail & Related papers (2025-12-13T10:23:10Z)
Z-Space: A Multi-Agent Tool Orchestration Framework for Enterprise-Grade LLM Automation [3.518072776386001]
This paper proposes Z-Space, a data-generation-oriented multi-agent collaborative tool invocation framework.<n>The framework has been deployed in the Eleme platform's technical division, serving large-scale test data generation scenarios.<n>Production data demonstrates that the system reduces average token consumption in tool inference by 96.26%.
arXiv Detail & Related papers (2025-11-23T03:59:14Z)
TPS-Bench: Evaluating AI Agents' Tool Planning \& Scheduling Abilities in Compounding Tasks [23.96822236741708]
Large language model (LLM) agents have exhibited strong problem-solving competence across domains like research and coding.<n>This paper introduces TPS-Bench to benchmark the ability of LLM agents in solving such problems that demand Tool Planning and Scheduling.
arXiv Detail & Related papers (2025-11-03T12:45:39Z)
GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning [20.75113227786218]
Graph-based Agent Planning (GAP) is a novel framework that explicitly models inter-task dependencies through graph-based planning.<n>Our approach trains agent foundation models to decompose complex tasks into dependency-aware sub-task graphs.<n>This dependency-aware orchestration achieves substantial improvements in both execution efficiency and task accuracy.
arXiv Detail & Related papers (2025-10-29T09:35:55Z)
MassTool: A Multi-Task Search-Based Tool Retrieval Framework for Large Language Models [45.63804847907601]
MassTool is a multi-task search-based framework designed to enhance both query representation and tool retrieval accuracy.<n>It employs a two-tower architecture: a tool usage detection tower that predicts the need for function calls, and a tool retrieval tower that leverages a query-centric graph convolution network (QC-GCN) for effective query-tool matching.<n>By jointly optimizing tool usage detection loss, list-wise retrieval loss, and contrastive regularization loss, MassTool establishes a robust dual-step sequential decision-making pipeline for precise query understanding.
arXiv Detail & Related papers (2025-07-01T07:02:26Z)
Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning [69.32855772335624]
Multimodal agents, which integrate a controller e.g., a vision language model, with external tools, have demonstrated remarkable capabilities in tackling complex multimodal tasks.<n>Existing approaches for training these agents depend on extensive human-annotated task-answer pairs and tool trajectories.<n>We propose an iterative tool usage exploration method for multimodal agents without any pre-collected data, namely SPORT.<n>SPORT has four iterative components: task synthesis, step sampling, step verification, and preference tuning.
arXiv Detail & Related papers (2025-04-30T12:01:27Z)
OTC: Optimal Tool Calls via Reinforcement Learning [87.28134636548705]
We propose a tool-integrated reward that jointly considers correctness and tool efficiency, promoting high tool productivity. Our approach reduces tool calls by up to 73.1% and improves tool productivity by up to 229.4%, while maintaining comparable answer accuracy.
arXiv Detail & Related papers (2025-04-21T05:40:05Z)
Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation [36.29566268457534]
We introduce a novel parallel tool invocation paradigm, DTA-Llama. First, we transform traditional tree-based tool search paths into Directed Acyclic Graph (DAG) structure. The DTA-Llama is then trained on the dataset to learn to iteratively divide the current task into several parallel tool invocation sub-tasks.
arXiv Detail & Related papers (2025-01-21T16:49:08Z)
StepTool: A Step-grained Reinforcement Learning Framework for Tool Learning in LLMs [44.906714156993694]
We introduce StepTool, a novel step-grained reinforcement learning framework to improve tool learning in Large Language Models. StepTool significantly outperforms existing methods in multi-step, tool-based tasks.
arXiv Detail & Related papers (2024-10-10T09:23:26Z)
Data-Efficient Massive Tool Retrieval: A Reinforcement Learning Approach for Query-Tool Alignment with Language Models [28.67532617021655]
Large language models (LLMs) integrated with external tools and APIs have successfully addressed complex tasks by using in-context learning or fine-tuning. Despite this progress, the vast scale of tool retrieval remains challenging due to stringent input length constraints. We propose a pre-retrieval strategy from an extensive repository, effectively framing the problem as the massive tool retrieval (MTR) task.
arXiv Detail & Related papers (2024-10-04T07:58:05Z)
TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning [61.14586098005874]
Current Large Language Models (LLMs) exhibit limited ability to understand table structures and to apply precise numerical reasoning. We introduce our Tool-Augmented Reasoning framework for Tables (TART), which integrates LLMs with specialized tools. TART contains three key components: a table formatter to ensure accurate data representation, a tool maker to develop specific computational tools, and an explanation generator to maintain explainability.
arXiv Detail & Related papers (2024-09-18T06:19:59Z)
Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval [47.81307125613145]
Re-Invoke is an unsupervised tool retrieval method designed to scale effectively to large toolsets without training. We employ a novel multi-view similarity ranking strategy based on intents to pinpoint the most relevant tools for each query. Our evaluation demonstrates that Re-Invoke significantly outperforms state-of-the-art alternatives in both single-tool and multi-tool scenarios.
arXiv Detail & Related papers (2024-08-03T22:49:27Z)
Context Tuning for Retrieval Augmented Generation [1.201626478128059]
We propose Context Tuning for RAG, which employs a smart context retrieval system to fetch relevant information. Our empirical results demonstrate that context tuning significantly enhances semantic search. We also show that our proposed lightweight model using Reciprocal Rank Fusion (RRF) withMART outperforms GPT-4 based retrieval.
arXiv Detail & Related papers (2023-12-09T23:33:16Z)
Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data. For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z)
ADaPT: As-Needed Decomposition and Planning with Language Models [131.063805299796]
We introduce As-Needed Decomposition and Planning for complex Tasks (ADaPT) ADaPT explicitly plans and decomposes complex sub-tasks as-needed, when the Large Language Models is unable to execute them. Our results demonstrate that ADaPT substantially outperforms established strong baselines.
arXiv Detail & Related papers (2023-11-08T17:59:15Z)
ART: Automatic multi-step reasoning and tool-use for large language models [105.57550426609396]
Large language models (LLMs) can perform complex reasoning in few- and zero-shot settings. Each reasoning step can rely on external tools to support computation beyond the core LLM capabilities. We introduce Automatic Reasoning and Tool-use (ART), a framework that uses frozen LLMs to automatically generate intermediate reasoning steps as a program.
arXiv Detail & Related papers (2023-03-16T01:04:45Z)
Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph. Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference. Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.