On the Tool Manipulation Capability of Open-source Large Language Models
- URL: http://arxiv.org/abs/2305.16504v1
- Date: Thu, 25 May 2023 22:10:20 GMT
- Title: On the Tool Manipulation Capability of Open-source Large Language Models
- Authors: Qiantong Xu, Fenglu Hong, Bo Li, Changran Hu, Zhengyu Chen, Jian Zhang
- Abstract summary: We show can we enhance open-source LLMs to be competitive to leading closed LLM APIs in tool manipulation.
Our techniques can boost leading open-source LLMs by up to 90% success rate, showing capabilities competitive to OpenAI GPT-4 in 4 out of 8 ToolBench tasks.
- Score: 19.6917640220883
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies on software tool manipulation with large language models
(LLMs) mostly rely on closed model APIs. The industrial adoption of these
models is substantially constrained due to the security and robustness risks in
exposing information to closed LLM API services. In this paper, we ask can we
enhance open-source LLMs to be competitive to leading closed LLM APIs in tool
manipulation, with practical amount of human supervision. By analyzing common
tool manipulation failures, we first demonstrate that open-source LLMs may
require training with usage examples, in-context demonstration and generation
style regulation to resolve failures. These insights motivate us to revisit
classical methods in LLM literature, and demonstrate that we can adapt them as
model alignment with programmatic data generation, system prompts and
in-context demonstration retrievers to enhance open-source LLMs for tool
manipulation. To evaluate these techniques, we create the ToolBench, a tool
manipulation benchmark consisting of diverse software tools for real-world
tasks. We demonstrate that our techniques can boost leading open-source LLMs by
up to 90% success rate, showing capabilities competitive to OpenAI GPT-4 in 4
out of 8 ToolBench tasks. We show that such enhancement typically requires
about one developer day to curate data for each tool, rendering a recipe with
practical amount of human supervision.
Related papers
- Chain of Tools: Large Language Model is an Automatic Multi-tool Learner [54.992464510992605]
Automatic Tool Chain (ATC) is a framework that enables the large language models (LLMs) to act as a multi-tool user.
To scale up the scope of the tools, we next propose a black-box probing method.
For a comprehensive evaluation, we build a challenging benchmark named ToolFlow.
arXiv Detail & Related papers (2024-05-26T11:40:58Z) - Towards Completeness-Oriented Tool Retrieval for Large Language Models [60.733557487886635]
Real-world systems often incorporate a wide array of tools, making it impractical to input all tools into Large Language Models.
Existing tool retrieval methods primarily focus on semantic matching between user queries and tool descriptions.
We propose a novel modelagnostic COllaborative Learning-based Tool Retrieval approach, COLT, which captures not only the semantic similarities between user queries and tool descriptions but also takes into account the collaborative information of tools.
arXiv Detail & Related papers (2024-05-25T06:41:23Z) - LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error [54.954211216847135]
Existing large language models (LLMs) only reach a correctness rate in the range of 30% to 60%.
We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE)
STE orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory.
arXiv Detail & Related papers (2024-03-07T18:50:51Z) - Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models [26.28459880766842]
We propose a decision-aware and generalizable tool-usage framework (DEER)
Specifically, we first construct the tool-usage samples with multiple decision branches via an automatic generation pipeline.
Our proposed DEER is effective and significantly outperforms baselines across various datasets.
arXiv Detail & Related papers (2024-02-26T16:11:03Z) - Efficient Tool Use with Chain-of-Abstraction Reasoning [65.18096363216574]
Large language models (LLMs) need to ground their reasoning to real-world knowledge.
There remains challenges for fine-tuning LLM agents to invoke tools in multi-step reasoning problems.
We propose a new method for LLMs to better leverage tools in multi-step reasoning.
arXiv Detail & Related papers (2024-01-30T21:53:30Z) - Confucius: Iterative Tool Learning from Introspection Feedback by
Easy-to-Difficult Curriculum [42.36892453363961]
We propose a novel tool learning framework to train large language models (LLMs) to use complicated tools in real-world scenarios.
We first propose a multi-stage learning method to teach the LLM to use various tools from an easy-to-difficult curriculum.
We then propose the Iterative Self-instruct from Introspective Feedback to dynamically construct the dataset to improve the ability to use the complicated tool.
arXiv Detail & Related papers (2023-08-27T07:53:00Z) - GPT4Tools: Teaching Large Language Model to Use Tools via
Self-instruction [41.36474802204914]
GPT4Tools is based on self-instruct to enable open-source LLMs, such as LLaMA and OPT, to use tools.
It generates an instruction-following dataset by prompting an advanced teacher with various multi-modal contexts.
arXiv Detail & Related papers (2023-05-30T05:27:21Z) - Large Language Models as Tool Makers [85.00361145117293]
We introduce a closed-loop framework, referred to as LLMs A s Tool Makers (LATM), where LLMs create their own reusable tools for problem-solving.
Our approach consists of two phases: 1) tool making: an LLM acts as the tool maker that crafts tools for a set of tasks. 2) tool using: another LLM acts as the tool user, which applies the tool built by the tool maker for problem-solving.
arXiv Detail & Related papers (2023-05-26T17:50:11Z) - CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models [74.22729793816451]
Large Language Models (LLMs) have made significant progress in utilizing tools, but their ability is limited by API availability.
We propose CREATOR, a novel framework that enables LLMs to create their own tools using documentation and code realization.
We evaluate CREATOR on MATH and TabMWP benchmarks, respectively consisting of challenging math competition problems.
arXiv Detail & Related papers (2023-05-23T17:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.