MetaTool Benchmark for Large Language Models: Deciding Whether to Use
Tools and Which to Use
- URL: http://arxiv.org/abs/2310.03128v5
- Date: Fri, 23 Feb 2024 13:19:52 GMT
- Title: MetaTool Benchmark for Large Language Models: Deciding Whether to Use
Tools and Which to Use
- Authors: Yue Huang and Jiawen Shi and Yuan Li and Chenrui Fan and Siyuan Wu and
Qihui Zhang and Yixin Liu and Pan Zhou and Yao Wan and Neil Zhenqiang Gong
and Lichao Sun
- Abstract summary: Large language models (LLMs) have garnered significant attention due to their impressive natural language processing (NLP) capabilities.
We introduce MetaTool, a benchmark designed to evaluate whether LLMs have tool usage awareness and can correctly choose tools.
We conduct experiments involving eight popular LLMs and find that the majority of them still struggle to effectively select tools.
- Score: 82.24774504584066
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models (LLMs) have garnered significant attention due to their
impressive natural language processing (NLP) capabilities. Recently, many
studies have focused on the tool utilization ability of LLMs. They primarily
investigated how LLMs effectively collaborate with given specific tools.
However, in scenarios where LLMs serve as intelligent agents, as seen in
applications like AutoGPT and MetaGPT, LLMs are expected to engage in intricate
decision-making processes that involve deciding whether to employ a tool and
selecting the most suitable tool(s) from a collection of available tools to
fulfill user requests. Therefore, in this paper, we introduce MetaTool, a
benchmark designed to evaluate whether LLMs have tool usage awareness and can
correctly choose tools. Specifically, we create a dataset called ToolE within
the benchmark. This dataset contains various types of user queries in the form
of prompts that trigger LLMs to use tools, including both single-tool and
multi-tool scenarios. Subsequently, we set the tasks for both tool usage
awareness and tool selection. We define four subtasks from different
perspectives in tool selection, including tool selection with similar choices,
tool selection in specific scenarios, tool selection with possible reliability
issues, and multi-tool selection. We conduct experiments involving eight
popular LLMs and find that the majority of them still struggle to effectively
select tools, highlighting the existing gaps between LLMs and genuine
intelligent agents. However, through the error analysis, we found there is
still significant room for improvement. Finally, we conclude with insights for
tool developers -- we strongly recommend that tool developers choose an
appropriate rewrite model for generating new descriptions based on the
downstream LLM the tool will apply to. Our code is in
https://github.com/HowieHwong/MetaTool.
Related papers
- PTR: Precision-Driven Tool Recommendation for Large Language Models [43.53494041932615]
We propose a Precision-driven Tool Recommendation (PTR) approach for Large Language Models (LLMs)
PTR captures an initial, concise set of tools by leveraging historical tool bundle usage and dynamically adjusts the tool set by performing tool matching.
We present a new dataset, RecTools, and a metric, TRACC, designed to evaluate the effectiveness of tool recommendation for LLMs.
arXiv Detail & Related papers (2024-11-14T17:33:36Z) - Chain of Tools: Large Language Model is an Automatic Multi-tool Learner [54.992464510992605]
Automatic Tool Chain (ATC) is a framework that enables the large language models (LLMs) to act as a multi-tool user.
To scale up the scope of the tools, we next propose a black-box probing method.
For a comprehensive evaluation, we build a challenging benchmark named ToolFlow.
arXiv Detail & Related papers (2024-05-26T11:40:58Z) - Let Me Do It For You: Towards LLM Empowered Recommendation via Tool Learning [57.523454568002144]
Large language models (LLMs) have shown capabilities in commonsense reasoning and leveraging external tools.
We introduce ToolRec, a framework for LLM-empowered recommendations via tool learning.
We formulate the recommendation process as a process aimed at exploring user interests in attribute granularity.
We consider two types of attribute-oriented tools: rank tools and retrieval tools.
arXiv Detail & Related papers (2024-05-24T00:06:54Z) - What Are Tools Anyway? A Survey from the Language Model Perspective [67.18843218893416]
Language models (LMs) are powerful yet mostly for text generation tasks.
We provide a unified definition of tools as external programs used by LMs.
We empirically study the efficiency of various tooling methods.
arXiv Detail & Related papers (2024-03-18T17:20:07Z) - LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error [54.954211216847135]
Existing large language models (LLMs) only reach a correctness rate in the range of 30% to 60%.
We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE)
STE orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory.
arXiv Detail & Related papers (2024-03-07T18:50:51Z) - Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models [26.28459880766842]
We propose a decision-aware and generalizable tool-usage framework (DEER)
Specifically, we first construct the tool-usage samples with multiple decision branches via an automatic generation pipeline.
Our proposed DEER is effective and significantly outperforms baselines across various datasets.
arXiv Detail & Related papers (2024-02-26T16:11:03Z) - MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning [38.610185966889226]
We propose MLLM-Tool, a system incorporating open-source large language models and multi-modal encoders.
The learnt LLMs can be conscious of multi-modal input instruction and then select the function-matched tool correctly.
Experiments reveal that our MLLM-Tool is capable of recommending appropriate tools for multi-modal instructions.
arXiv Detail & Related papers (2024-01-19T14:44:37Z) - EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction [56.02100384015907]
EasyTool is a framework transforming diverse and lengthy tool documentation into a unified and concise tool instruction.
It can significantly reduce token consumption and improve the performance of tool utilization in real-world scenarios.
arXiv Detail & Related papers (2024-01-11T15:45:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.