Toolken+: Improving LLM Tool Usage with Reranking and a Reject Option
- URL: http://arxiv.org/abs/2410.12004v1
- Date: Tue, 15 Oct 2024 19:09:03 GMT
- Title: Toolken+: Improving LLM Tool Usage with Reranking and a Reject Option
- Authors: Konstantin Yakovlev, Sergey Nikolenko, Andrey Bout,
- Abstract summary: We introduce Toolken+ that mitigates the first problem by reranking top $k$ tools selected by ToolkenGPT.
We demonstrate the effectiveness of Toolken+ on multistep numerical reasoning and tool selection tasks.
- Score: 5.61458021213001
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recently proposed ToolkenGPT tool learning paradigm demonstrates promising performance but suffers from two major issues: first, it cannot benefit from tool documentation, and second, it often makes mistakes in whether to use a tool at all. We introduce Toolken+ that mitigates the first problem by reranking top $k$ tools selected by ToolkenGPT and the second problem with a special "Reject" option such that the model will generate a vocabulary token if "Reject" is ranked first. We demonstrate the effectiveness of Toolken+ on multistep numerical reasoning and tool selection tasks.
Related papers
- ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients" [53.7887350405379]
Prior work synthesizes tool-use LLM datasets by first generating a user query, followed by complex tool-use annotations like DFS.<n>We introduce ToolGrad, an agentic framework that inverts this paradigm. ToolGrad first constructs valid tool-use chains through an iterative process guided by textual "gradients"<n>This "answer-first" approach led to ToolGrad-5k, a dataset generated with more complex tool use, lower cost, and 100% pass rate.
arXiv Detail & Related papers (2025-08-06T05:04:00Z) - Re-Initialization Token Learning for Tool-Augmented Large Language Models [49.91503552002649]
Large language models have demonstrated exceptional performance, yet struggle with complex tasks such as numerical reasoning, plan generation.<n>We propose a novel token learning method that aligns tool tokens with the existing word embedding space.<n>We evaluate the method on tasks such as numerical reasoning, knowledge-based question answering, and embodied plan generation using GSM8K-XL, FuncQA, KAMEL, and VirtualHome datasets.
arXiv Detail & Related papers (2025-06-17T07:11:00Z) - Advancing and Benchmarking Personalized Tool Invocation for LLMs [66.39214525683425]
We introduce the concept of Personalized Tool Invocation and define two key tasks: Tool Preference and Profile-dependent Query.<n>To tackle these challenges, we propose PTool, a data synthesis framework designed for personalized tool invocation.<n>We construct textbfPTBench, the first benchmark for evaluating personalized tool invocation.
arXiv Detail & Related papers (2025-05-07T02:25:20Z) - Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models [8.573278807410507]
Tool learning can further broaden the usage scenarios of large language models (LLMs)
We present a new Tool Learning method Chain-of-Tools.
It makes full use of the powerful semantic representation capability of frozen LLMs to finish tool calling in CoT reasoning.
arXiv Detail & Related papers (2025-03-21T01:26:12Z) - PTR: Precision-Driven Tool Recommendation for Large Language Models [43.53494041932615]
We propose a Precision-driven Tool Recommendation (PTR) approach for Large Language Models (LLMs)
PTR captures an initial, concise set of tools by leveraging historical tool bundle usage and dynamically adjusts the tool set by performing tool matching.
We present a new dataset, RecTools, and a metric, TRACC, designed to evaluate the effectiveness of tool recommendation for LLMs.
arXiv Detail & Related papers (2024-11-14T17:33:36Z) - Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval [47.81307125613145]
Re-Invoke is an unsupervised tool retrieval method designed to scale effectively to large toolsets without training.
We employ a novel multi-view similarity ranking strategy based on intents to pinpoint the most relevant tools for each query.
Our evaluation demonstrates that Re-Invoke significantly outperforms state-of-the-art alternatives in both single-tool and multi-tool scenarios.
arXiv Detail & Related papers (2024-08-03T22:49:27Z) - Tools Fail: Detecting Silent Errors in Faulty Tools [27.822981272044043]
We introduce a framework for tools which guides us to explore a model's ability to detect "silent" tool.
We provide an initial approach to failure recovery with promising results both on a controlled calculator setting and embodied agent planning.
arXiv Detail & Related papers (2024-06-27T14:52:34Z) - Enhancing Tool Retrieval with Iterative Feedback from Large Language Models [9.588592185027455]
Large language models (LLMs) can effectively handle a certain amount of tools through in-context learning or fine-tuning.
In real-world scenarios, the number of tools is typically extensive and irregularly updated, emphasizing the necessity for a dedicated tool retrieval component.
We propose to enhance tool retrieval with iterative feedback from the large language model.
arXiv Detail & Related papers (2024-06-25T11:12:01Z) - Tool-Planner: Task Planning with Clusters across Multiple Tools [29.278169900986434]
We propose Tool-Planner, a task-processing framework based on toolkits.
Tool-Planner groups tools based on the API functions with the same function into a toolkit and allows LLMs to implement planning across the various toolkits.
arXiv Detail & Related papers (2024-06-06T07:30:14Z) - Chain of Tools: Large Language Model is an Automatic Multi-tool Learner [54.992464510992605]
Automatic Tool Chain (ATC) is a framework that enables the large language models (LLMs) to act as a multi-tool user.
To scale up the scope of the tools, we next propose a black-box probing method.
For a comprehensive evaluation, we build a challenging benchmark named ToolFlow.
arXiv Detail & Related papers (2024-05-26T11:40:58Z) - TOOLVERIFIER: Generalization to New Tools via Self-Verification [69.85190990517184]
We introduce a self-verification method which distinguishes between close candidates by self-asking contrastive questions during tool selection.
Experiments on 4 tasks from the ToolBench benchmark, consisting of 17 unseen tools, demonstrate an average improvement of 22% over few-shot baselines.
arXiv Detail & Related papers (2024-02-21T22:41:38Z) - ControlLLM: Augment Language Models with Tools by Searching on Graphs [97.62758830255002]
We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving real-world tasks.
Our framework comprises three key components: (1) a textittask decomposer that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a textitThoughts-on-Graph (ToG) paradigm that searches the optimal solution path on a pre-built tool graph; and (3) an textitexecution engine with a rich toolbox that interprets the solution path and runs the
arXiv Detail & Related papers (2023-10-26T21:57:21Z) - MetaTool Benchmark for Large Language Models: Deciding Whether to Use
Tools and Which to Use [82.24774504584066]
Large language models (LLMs) have garnered significant attention due to their impressive natural language processing (NLP) capabilities.
We introduce MetaTool, a benchmark designed to evaluate whether LLMs have tool usage awareness and can correctly choose tools.
We conduct experiments involving eight popular LLMs and find that the majority of them still struggle to effectively select tools.
arXiv Detail & Related papers (2023-10-04T19:39:26Z) - Tool Documentation Enables Zero-Shot Tool-Usage with Large Language
Models [90.96816639172464]
Large language models (LLMs) are taught to use new tools by providing a few demonstrations of the tool's usage.
We advocate the use of tool documentation, descriptions for the individual tool usage, over demonstrations.
arXiv Detail & Related papers (2023-08-01T17:21:38Z) - Large Language Models as Tool Makers [85.00361145117293]
We introduce a closed-loop framework, referred to as LLMs A s Tool Makers (LATM), where LLMs create their own reusable tools for problem-solving.
Our approach consists of two phases: 1) tool making: an LLM acts as the tool maker that crafts tools for a set of tasks. 2) tool using: another LLM acts as the tool user, which applies the tool built by the tool maker for problem-solving.
arXiv Detail & Related papers (2023-05-26T17:50:11Z) - ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via
Tool Embeddings [25.5476046472217]
Augmenting large language models with external tools has emerged as a promising approach to solving complex problems.
Recent in-context learning paradigm alleviates these issues, but the limited context length only allows for a few shots of demonstrations.
We propose an alternative approach, $textbfToolkenGPT$, which combines the benefits of both sides.
arXiv Detail & Related papers (2023-05-19T09:54:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.