LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
- URL: http://arxiv.org/abs/2403.04746v1
- Date: Thu, 7 Mar 2024 18:50:51 GMT
- Title: LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
- Authors: Boshi Wang, Hao Fang, Jason Eisner, Benjamin Van Durme, Yu Su
- Abstract summary: Existing large language models (LLMs) only reach a correctness rate in the range of 30% to 60%.
We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE)
STE orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory.
- Score: 54.954211216847135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tools are essential for large language models (LLMs) to acquire up-to-date
information and take consequential actions in external environments. Existing
work on tool-augmented LLMs primarily focuses on the broad coverage of tools
and the flexibility of adding new tools. However, a critical aspect that has
surprisingly been understudied is simply how accurately an LLM uses tools for
which it has been trained. We find that existing LLMs, including GPT-4 and
open-source LLMs specifically fine-tuned for tool use, only reach a correctness
rate in the range of 30% to 60%, far from reliable use in practice. We propose
a biologically inspired method for tool-augmented LLMs, simulated trial and
error (STE), that orchestrates three key mechanisms for successful tool use
behaviors in the biological system: trial and error, imagination, and memory.
Specifically, STE leverages an LLM's 'imagination' to simulate plausible
scenarios for using a tool, after which the LLM interacts with the tool to
learn from its execution feedback. Both short-term and long-term memory are
employed to improve the depth and breadth of the exploration, respectively.
Comprehensive experiments on ToolBench show that STE substantially improves
tool learning for LLMs under both in-context learning and fine-tuning settings,
bringing a boost of 46.7% to Mistral-Instruct-7B and enabling it to outperform
GPT-4. We also show effective continual learning of tools via a simple
experience replay strategy.
Related papers
- Self-Training Large Language Models for Tool-Use Without Demonstrations [15.17750971071501]
Large language models (LLMs) remain prone to factual inaccuracies and computational errors.
Recent work augmented LLMs with tools to mitigate these shortcomings, but often requires curated gold tool-use demonstrations.
This paper investigates whether LLMs can learn to use tools without demonstrations.
arXiv Detail & Related papers (2025-02-09T12:06:10Z) - Tool Unlearning for Tool-Augmented LLMs [14.755831733659699]
Tool-augmented large language models (LLMs) are often trained on datasets of query-response pairs.
ToolDelete is the first approach for unlearning tools from tool-augmented LLMs.
arXiv Detail & Related papers (2025-02-03T05:50:55Z) - Chain of Tools: Large Language Model is an Automatic Multi-tool Learner [54.992464510992605]
Automatic Tool Chain (ATC) is a framework that enables the large language models (LLMs) to act as a multi-tool user.
To scale up the scope of the tools, we next propose a black-box probing method.
For a comprehensive evaluation, we build a challenging benchmark named ToolFlow.
arXiv Detail & Related papers (2024-05-26T11:40:58Z) - Towards Completeness-Oriented Tool Retrieval for Large Language Models [60.733557487886635]
Real-world systems often incorporate a wide array of tools, making it impractical to input all tools into Large Language Models.
Existing tool retrieval methods primarily focus on semantic matching between user queries and tool descriptions.
We propose a novel modelagnostic COllaborative Learning-based Tool Retrieval approach, COLT, which captures not only the semantic similarities between user queries and tool descriptions but also takes into account the collaborative information of tools.
arXiv Detail & Related papers (2024-05-25T06:41:23Z) - Efficient Tool Use with Chain-of-Abstraction Reasoning [63.08202389132155]
Large language models (LLMs) need to ground their reasoning to real-world knowledge.
There remains challenges for fine-tuning LLM agents to invoke tools in multi-step reasoning problems.
We propose a new method for LLMs to better leverage tools in multi-step reasoning.
arXiv Detail & Related papers (2024-01-30T21:53:30Z) - EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction [56.02100384015907]
EasyTool is a framework transforming diverse and lengthy tool documentation into a unified and concise tool instruction.
It can significantly reduce token consumption and improve the performance of tool utilization in real-world scenarios.
arXiv Detail & Related papers (2024-01-11T15:45:11Z) - ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios [49.33633818046644]
We propose ToolEyes, a fine-grained system tailored for the evaluation of large language models' tool learning capabilities in authentic scenarios.
The system meticulously examines seven real-world scenarios, analyzing five dimensions crucial to LLMs in tool learning.
ToolEyes incorporates a tool library boasting approximately 600 tools, serving as an intermediary between LLMs and the physical world.
arXiv Detail & Related papers (2024-01-01T12:49:36Z) - Confucius: Iterative Tool Learning from Introspection Feedback by
Easy-to-Difficult Curriculum [42.36892453363961]
We propose a novel tool learning framework to train large language models (LLMs) to use complicated tools in real-world scenarios.
We first propose a multi-stage learning method to teach the LLM to use various tools from an easy-to-difficult curriculum.
We then propose the Iterative Self-instruct from Introspective Feedback to dynamically construct the dataset to improve the ability to use the complicated tool.
arXiv Detail & Related papers (2023-08-27T07:53:00Z) - GPT4Tools: Teaching Large Language Model to Use Tools via
Self-instruction [41.36474802204914]
GPT4Tools is based on self-instruct to enable open-source LLMs, such as LLaMA and OPT, to use tools.
It generates an instruction-following dataset by prompting an advanced teacher with various multi-modal contexts.
arXiv Detail & Related papers (2023-05-30T05:27:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.