AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
- URL: http://arxiv.org/abs/2402.04253v1
- Date: Tue, 6 Feb 2024 18:59:57 GMT
- Title: AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
- Authors: Yu Du, Fangyun Wei, Hongyang Zhang
- Abstract summary: We introduce AnyTool, a large language model agent designed to revolutionize the utilization of a vast array of tools in addressing user queries.
We utilize over 16,000 APIs from Rapid API, operating under the assumption that a subset of these APIs could potentially resolve the queries.
AnyTool primarily incorporates three elements: an API retriever with a hierarchical structure, a solver aimed at resolving user queries using a selected set of API candidates, and a self-reflection mechanism.
- Score: 30.792186243538037
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce AnyTool, a large language model agent designed to revolutionize
the utilization of a vast array of tools in addressing user queries. We utilize
over 16,000 APIs from Rapid API, operating under the assumption that a subset
of these APIs could potentially resolve the queries. AnyTool primarily
incorporates three elements: an API retriever with a hierarchical structure, a
solver aimed at resolving user queries using a selected set of API candidates,
and a self-reflection mechanism, which re-activates AnyTool if the initial
solution proves impracticable. AnyTool is powered by the function calling
feature of GPT-4, eliminating the need for training external modules. We also
revisit the evaluation protocol introduced by previous works and identify a
limitation in this protocol that leads to an artificially high pass rate. By
revising the evaluation protocol to better reflect practical application
scenarios, we introduce an additional benchmark, termed AnyToolBench.
Experiments across various datasets demonstrate the superiority of our AnyTool
over strong baselines such as ToolLLM and a GPT-4 variant tailored for tool
utilization. For instance, AnyTool outperforms ToolLLM by +35.4% in terms of
average pass rate on ToolBench. Code will be available at
https://github.com/dyabel/AnyTool.
Related papers
- Efficient and Scalable Estimation of Tool Representations in Vector Space [34.767193045989515]
We present a framework for generating synthetic data for tool retrieval applications and an efficient data-driven tool retrieval strategy using small encoder models.
We create ToolBank, a new tool retrieval dataset that reflects real human user usages.
With these new methods, we achieve improvements of up to 27.28 in Recall@K on the ToolBench dataset and 30.5 in Recall@K on ToolBank.
arXiv Detail & Related papers (2024-09-02T19:39:24Z) - Chain of Tools: Large Language Model is an Automatic Multi-tool Learner [54.992464510992605]
Automatic Tool Chain (ATC) is a framework that enables the large language models (LLMs) to act as a multi-tool user.
To scale up the scope of the tools, we next propose a black-box probing method.
For a comprehensive evaluation, we build a challenging benchmark named ToolFlow.
arXiv Detail & Related papers (2024-05-26T11:40:58Z) - Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmark [8.573278807410507]
This paper presents a new tool learning dataset Seal-Tools.
Seal-Tools contains self-instruct API-like tools.
It also includes instances which demonstrate the practical application of tools.
arXiv Detail & Related papers (2024-05-14T06:50:19Z) - StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models [74.88844320554284]
We introduce StableToolBench, a benchmark evolving from ToolBench.
The virtual API server contains a caching system and API simulators which are complementary to alleviate the change in API status.
The stable evaluation system designs solvable pass and win rates using GPT-4 as the automatic evaluator to eliminate the randomness during evaluation.
arXiv Detail & Related papers (2024-03-12T14:57:40Z) - EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction [56.02100384015907]
EasyTool is a framework transforming diverse and lengthy tool documentation into a unified and concise tool instruction.
It can significantly reduce token consumption and improve the performance of tool utilization in real-world scenarios.
arXiv Detail & Related papers (2024-01-11T15:45:11Z) - MetaTool Benchmark for Large Language Models: Deciding Whether to Use
Tools and Which to Use [82.24774504584066]
Large language models (LLMs) have garnered significant attention due to their impressive natural language processing (NLP) capabilities.
We introduce MetaTool, a benchmark designed to evaluate whether LLMs have tool usage awareness and can correctly choose tools.
We conduct experiments involving eight popular LLMs and find that the majority of them still struggle to effectively select tools.
arXiv Detail & Related papers (2023-10-04T19:39:26Z) - ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world
APIs [104.37772295581088]
Open-source large language models (LLMs), e.g., LLaMA, remain significantly limited in tool-use capabilities.
We introduce ToolLLM, a general tool-usetuning encompassing data construction, model training, and evaluation.
We first present ToolBench, an instruction-tuning framework for tool use, which is constructed automatically using ChatGPT.
arXiv Detail & Related papers (2023-07-31T15:56:53Z) - ToolCoder: Teach Code Generation Models to use API search tools [44.370920906850024]
We propose ToolCoder, a novel approach that integrates API search tools with existing models to assist in code generation and API selection.
Our experimental results demonstrate that ToolCoder exhibits excellent performance and generalization across five public and private library code generation benchmarks.
arXiv Detail & Related papers (2023-05-06T12:45:28Z) - API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs [84.45284695156771]
API-Bank is a groundbreaking benchmark for tool-augmented Large Language Models.
We develop a run evaluation system consisting of 73 API tools.
We construct a comprehensive training set containing 1,888 tool-use dialogues from 2,138 APIs spanning 1,000 distinct domains.
arXiv Detail & Related papers (2023-04-14T14:05:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.