Related papers: ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and Feedback

ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and Feedback

URL: http://arxiv.org/abs/2409.14826v3
Date: Mon, 4 Nov 2024 02:29:32 GMT
Title: ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and Feedback
Authors: Qinzhuo Wu, Wei Liu, Jian Luan, Bin Wang,
Abstract summary: Given an instruction, tool-augmented LLMs can interact with various external tools in multiple rounds and provide a final answer. Previous LLMs were trained on overly detailed instructions, which included API names or parameters, while real users would not explicitly mention these API details. To address these issues, we constructed a training dataset called MGToolBench, which contains statement and category-level instructions to better reflect real-world scenarios.
Score: 11.931584529573176
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, tool-augmented LLMs have gained increasing attention. Given an instruction, tool-augmented LLMs can interact with various external tools in multiple rounds and provide a final answer. However, previous LLMs were trained on overly detailed instructions, which included API names or parameters, while real users would not explicitly mention these API details. This leads to a gap between trained LLMs and real-world scenarios. In addition, most works ignore whether the interaction process follows the instruction. To address these issues, we constructed a training dataset called MGToolBench, which contains statement and category-level instructions to better reflect real-world scenarios. In addition, we propose ToolPlanner, a two-stage reinforcement learning framework that utilizes path planning and two feedback mechanisms to enhance the LLM's task completion and instruction-following capabilities. Experimental results show that ToolPlanner significantly improves the Match Rate, Pass Rate and Win Rate by 26.8%, 20.2%, and 5.6% compared to the SOTA model. Human evaluation verifies that the multi-granularity instructions can better align with users' usage habits. Our data and code will be released upon acceptance.

Related papers

Learning to Ask: When LLMs Meet Unclear Instruction [49.256630152684764]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone. We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench. We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z)
Chain of Tools: Large Language Model is an Automatic Multi-tool Learner [54.992464510992605]
Automatic Tool Chain (ATC) is a framework that enables the large language models (LLMs) to act as a multi-tool user. To scale up the scope of the tools, we next propose a black-box probing method. For a comprehensive evaluation, we build a challenging benchmark named ToolFlow.
arXiv Detail & Related papers (2024-05-26T11:40:58Z)
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error [54.954211216847135]
Existing large language models (LLMs) only reach a correctness rate in the range of 30% to 60%. We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE) STE orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory.
arXiv Detail & Related papers (2024-03-07T18:50:51Z)
Efficient Tool Use with Chain-of-Abstraction Reasoning [65.18096363216574]
Large language models (LLMs) need to ground their reasoning to real-world knowledge. There remains challenges for fine-tuning LLM agents to invoke tools in multi-step reasoning problems. We propose a new method for LLMs to better leverage tools in multi-step reasoning.
arXiv Detail & Related papers (2024-01-30T21:53:30Z)
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning [38.610185966889226]
We propose MLLM-Tool, a system incorporating open-source large language models and multi-modal encoders. The learnt LLMs can be conscious of multi-modal input instruction and then select the function-matched tool correctly. Experiments reveal that our MLLM-Tool is capable of recommending appropriate tools for multi-modal instructions.
arXiv Detail & Related papers (2024-01-19T14:44:37Z)
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs. We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z)
ToolQA: A Dataset for LLM Question Answering with External Tools [14.408707186450899]
Large Language Models (LLMs) have demonstrated impressive performance in various NLP tasks. They still suffer from challenges such as hallucination and weak numerical reasoning. To overcome these challenges, external tools can be used to enhance LLMs' question-answering abilities.
arXiv Detail & Related papers (2023-06-23T05:43:28Z)
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs [84.45284695156771]
API-Bank is a groundbreaking benchmark for tool-augmented Large Language Models. We develop a run evaluation system consisting of 73 API tools. We construct a comprehensive training set containing 1,888 tool-use dialogues from 2,138 APIs spanning 1,000 distinct domains.
arXiv Detail & Related papers (2023-04-14T14:05:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.