On the Tool Manipulation Capability of Open-source Large Language Models
- URL: http://arxiv.org/abs/2305.16504v1
- Date: Thu, 25 May 2023 22:10:20 GMT
- Title: On the Tool Manipulation Capability of Open-source Large Language Models
- Authors: Qiantong Xu, Fenglu Hong, Bo Li, Changran Hu, Zhengyu Chen, Jian Zhang
- Abstract summary: We show can we enhance open-source LLMs to be competitive to leading closed LLM APIs in tool manipulation.
Our techniques can boost leading open-source LLMs by up to 90% success rate, showing capabilities competitive to OpenAI GPT-4 in 4 out of 8 ToolBench tasks.
- Score: 19.6917640220883
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies on software tool manipulation with large language models
(LLMs) mostly rely on closed model APIs. The industrial adoption of these
models is substantially constrained due to the security and robustness risks in
exposing information to closed LLM API services. In this paper, we ask can we
enhance open-source LLMs to be competitive to leading closed LLM APIs in tool
manipulation, with practical amount of human supervision. By analyzing common
tool manipulation failures, we first demonstrate that open-source LLMs may
require training with usage examples, in-context demonstration and generation
style regulation to resolve failures. These insights motivate us to revisit
classical methods in LLM literature, and demonstrate that we can adapt them as
model alignment with programmatic data generation, system prompts and
in-context demonstration retrievers to enhance open-source LLMs for tool
manipulation. To evaluate these techniques, we create the ToolBench, a tool
manipulation benchmark consisting of diverse software tools for real-world
tasks. We demonstrate that our techniques can boost leading open-source LLMs by
up to 90% success rate, showing capabilities competitive to OpenAI GPT-4 in 4
out of 8 ToolBench tasks. We show that such enhancement typically requires
about one developer day to curate data for each tool, rendering a recipe with
practical amount of human supervision.
Related papers
- Self-Training Large Language Models for Tool-Use Without Demonstrations [15.17750971071501]
Large language models (LLMs) remain prone to factual inaccuracies and computational errors.
Recent work augmented LLMs with tools to mitigate these shortcomings, but often requires curated gold tool-use demonstrations.
This paper investigates whether LLMs can learn to use tools without demonstrations.
arXiv Detail & Related papers (2025-02-09T12:06:10Z) - Learning to Ask: When LLM Agents Meet Unclear Instruction [55.65312637965779]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone.
We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench.
We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z) - Chain of Tools: Large Language Model is an Automatic Multi-tool Learner [54.992464510992605]
Automatic Tool Chain (ATC) is a framework that enables the large language models (LLMs) to act as a multi-tool user.
To scale up the scope of the tools, we next propose a black-box probing method.
For a comprehensive evaluation, we build a challenging benchmark named ToolFlow.
arXiv Detail & Related papers (2024-05-26T11:40:58Z) - Towards Completeness-Oriented Tool Retrieval for Large Language Models [60.733557487886635]
Real-world systems often incorporate a wide array of tools, making it impractical to input all tools into Large Language Models.
Existing tool retrieval methods primarily focus on semantic matching between user queries and tool descriptions.
We propose a novel modelagnostic COllaborative Learning-based Tool Retrieval approach, COLT, which captures not only the semantic similarities between user queries and tool descriptions but also takes into account the collaborative information of tools.
arXiv Detail & Related papers (2024-05-25T06:41:23Z) - LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error [54.954211216847135]
Existing large language models (LLMs) only reach a correctness rate in the range of 30% to 60%.
We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE)
STE orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory.
arXiv Detail & Related papers (2024-03-07T18:50:51Z) - Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models [26.28459880766842]
We propose a decision-aware and generalizable tool-usage framework (DEER)
Specifically, we first construct the tool-usage samples with multiple decision branches via an automatic generation pipeline.
Our proposed DEER is effective and significantly outperforms baselines across various datasets.
arXiv Detail & Related papers (2024-02-26T16:11:03Z) - GPT4Tools: Teaching Large Language Model to Use Tools via
Self-instruction [41.36474802204914]
GPT4Tools is based on self-instruct to enable open-source LLMs, such as LLaMA and OPT, to use tools.
It generates an instruction-following dataset by prompting an advanced teacher with various multi-modal contexts.
arXiv Detail & Related papers (2023-05-30T05:27:21Z) - Large Language Models as Tool Makers [85.00361145117293]
We introduce a closed-loop framework, referred to as LLMs A s Tool Makers (LATM), where LLMs create their own reusable tools for problem-solving.
Our approach consists of two phases: 1) tool making: an LLM acts as the tool maker that crafts tools for a set of tasks. 2) tool using: another LLM acts as the tool user, which applies the tool built by the tool maker for problem-solving.
arXiv Detail & Related papers (2023-05-26T17:50:11Z) - CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models [74.22729793816451]
Large Language Models (LLMs) have made significant progress in utilizing tools, but their ability is limited by API availability.
We propose CREATOR, a novel framework that enables LLMs to create their own tools using documentation and code realization.
We evaluate CREATOR on MATH and TabMWP benchmarks, respectively consisting of challenging math competition problems.
arXiv Detail & Related papers (2023-05-23T17:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.