ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world
APIs
- URL: http://arxiv.org/abs/2307.16789v2
- Date: Tue, 3 Oct 2023 14:45:48 GMT
- Title: ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world
APIs
- Authors: Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu,
Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong,
Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu,
Maosong Sun
- Abstract summary: Open-source large language models (LLMs), e.g., LLaMA, remain significantly limited in tool-use capabilities.
We introduce ToolLLM, a general tool-usetuning encompassing data construction, model training, and evaluation.
We first present ToolBench, an instruction-tuning framework for tool use, which is constructed automatically using ChatGPT.
- Score: 104.37772295581088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the advancements of open-source large language models (LLMs), e.g.,
LLaMA, they remain significantly limited in tool-use capabilities, i.e., using
external tools (APIs) to fulfill human instructions. The reason is that current
instruction tuning largely focuses on basic language tasks but ignores the
tool-use domain. This is in contrast to the excellent tool-use capabilities of
state-of-the-art (SOTA) closed-source LLMs, e.g., ChatGPT. To bridge this gap,
we introduce ToolLLM, a general tool-use framework encompassing data
construction, model training, and evaluation. We first present ToolBench, an
instruction-tuning dataset for tool use, which is constructed automatically
using ChatGPT. Specifically, the construction can be divided into three stages:
(i) API collection: we collect 16,464 real-world RESTful APIs spanning 49
categories from RapidAPI Hub; (ii) instruction generation: we prompt ChatGPT to
generate diverse instructions involving these APIs, covering both single-tool
and multi-tool scenarios; (iii) solution path annotation: we use ChatGPT to
search for a valid solution path (chain of API calls) for each instruction. To
enhance the reasoning capabilities of LLMs, we develop a novel depth-first
search-based decision tree algorithm. It enables LLMs to evaluate multiple
reasoning traces and expand the search space. Moreover, to evaluate the
tool-use capabilities of LLMs, we develop an automatic evaluator: ToolEval.
Based on ToolBench, we fine-tune LLaMA to obtain an LLM ToolLLaMA, and equip it
with a neural API retriever to recommend appropriate APIs for each instruction.
Experiments show that ToolLLaMA demonstrates a remarkable ability to execute
complex instructions and generalize to unseen APIs, and exhibits comparable
performance to ChatGPT. Our ToolLLaMA also demonstrates strong zero-shot
generalization ability in an out-of-distribution tool-use dataset: APIBench.
Related papers
- Learning to Ask: When LLMs Meet Unclear Instruction [49.256630152684764]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone.
We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench.
We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z) - ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities [30.030101957186595]
ToolSandbox is an evaluation framework for large language models (LLMs)
ToolSandbox includes stateful tool execution, implicit state dependencies between tools, a built-in user simulator supporting on-policy conversational evaluation.
We show that open source and proprietary models have a significant performance gap, and complex tasks like State Dependency, Canonicalization and Insufficient Information defined in ToolSandbox are challenging even the most capable SOTA LLMs.
arXiv Detail & Related papers (2024-08-08T05:45:42Z) - Chain of Tools: Large Language Model is an Automatic Multi-tool Learner [54.992464510992605]
Automatic Tool Chain (ATC) is a framework that enables the large language models (LLMs) to act as a multi-tool user.
To scale up the scope of the tools, we next propose a black-box probing method.
For a comprehensive evaluation, we build a challenging benchmark named ToolFlow.
arXiv Detail & Related papers (2024-05-26T11:40:58Z) - AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls [30.792186243538037]
We introduce AnyTool, a large language model agent designed to revolutionize the utilization of a vast array of tools in addressing user queries.
We utilize over 16,000 APIs from Rapid API, operating under the assumption that a subset of these APIs could potentially resolve the queries.
AnyTool primarily incorporates three elements: an API retriever with a hierarchical structure, a solver aimed at resolving user queries using a selected set of API candidates, and a self-reflection mechanism.
arXiv Detail & Related papers (2024-02-06T18:59:57Z) - EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction [56.02100384015907]
EasyTool is a framework transforming diverse and lengthy tool documentation into a unified and concise tool instruction.
It can significantly reduce token consumption and improve the performance of tool utilization in real-world scenarios.
arXiv Detail & Related papers (2024-01-11T15:45:11Z) - CRAFT: Customizing LLMs by Creating and Retrieving from Specialized
Toolsets [75.64181719386497]
We present CRAFT, a tool creation and retrieval framework for large language models (LLMs)
It creates toolsets specifically curated for the tasks and equips LLMs with a component that retrieves tools from these sets to enhance their capability to solve complex tasks.
Our method is designed to be flexible and offers a plug-and-play approach to adapt off-the-shelf LLMs to unseen domains and modalities, without any finetuning.
arXiv Detail & Related papers (2023-09-29T17:40:26Z) - API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs [84.45284695156771]
API-Bank is a groundbreaking benchmark for tool-augmented Large Language Models.
We develop a run evaluation system consisting of 73 API tools.
We construct a comprehensive training set containing 1,888 tool-use dialogues from 2,138 APIs spanning 1,000 distinct domains.
arXiv Detail & Related papers (2023-04-14T14:05:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.