Related papers: Tools Fail: Detecting Silent Errors in Faulty Tools

Tools Fail: Detecting Silent Errors in Faulty Tools

URL: http://arxiv.org/abs/2406.19228v1
Date: Thu, 27 Jun 2024 14:52:34 GMT
Title: Tools Fail: Detecting Silent Errors in Faulty Tools
Authors: Jimin Sun, So Yeon Min, Yingshan Chang, Yonatan Bisk,
Abstract summary: We introduce a framework for tools which guides us to explore a model's ability to detect "silent" tool. We provide an initial approach to failure recovery with promising results both on a controlled calculator setting and embodied agent planning.
Score: 27.822981272044043
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Tools have become a mainstay of LLMs, allowing them to retrieve knowledge not in their weights, to perform tasks on the web, and even to control robots. However, most ontologies and surveys of tool-use have assumed the core challenge for LLMs is choosing the tool. Instead, we introduce a framework for tools more broadly which guides us to explore a model's ability to detect "silent" tool errors, and reflect on how to plan. This more directly aligns with the increasingly popular use of models as tools. We provide an initial approach to failure recovery with promising results both on a controlled calculator setting and embodied agent planning.

Related papers

RefTool: Enhancing Model Reasoning with Reference-Guided Tool Creation [44.128974924517465]
RefTool is a reference-guided framework for automatic tool creation.<n>It generates executable tools from reference content, validate them using illustrative examples, and organize them hierarchically into a toolbox.<n> Experiments on causality, physics, and chemistry benchmarks demonstrate that RefTool outperforms existing tool-creation and domain-specific reasoning methods.
arXiv Detail & Related papers (2025-05-27T16:41:19Z)
Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use. MeCo captures high-level cognitive signals in the representation space, guiding when to invoke tools. Our experiments show that MeCo accurately detects LLMs' internal cognitive signals and significantly improves tool-use decision-making.
arXiv Detail & Related papers (2025-02-18T15:45:01Z)
Tool Unlearning for Tool-Augmented LLMs [14.755831733659699]
Tool-augmented large language models (LLMs) are often trained on datasets of query-response pairs. ToolDelete is the first approach for unlearning tools from tool-augmented LLMs.
arXiv Detail & Related papers (2025-02-03T05:50:55Z)
PTR: Precision-Driven Tool Recommendation for Large Language Models [43.53494041932615]
We propose a Precision-driven Tool Recommendation (PTR) approach for Large Language Models (LLMs) PTR captures an initial, concise set of tools by leveraging historical tool bundle usage and dynamically adjusts the tool set by performing tool matching. We present a new dataset, RecTools, and a metric, TRACC, designed to evaluate the effectiveness of tool recommendation for LLMs.
arXiv Detail & Related papers (2024-11-14T17:33:36Z)
Learning to Ask: When LLMs Meet Unclear Instruction [49.256630152684764]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone. We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench. We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z)
Tool-Planner: Task Planning with Clusters across Multiple Tools [29.278169900986434]
We propose Tool-Planner, a task-processing framework based on toolkits. Tool-Planner groups tools based on the API functions with the same function into a toolkit and allows LLMs to implement planning across the various toolkits.
arXiv Detail & Related papers (2024-06-06T07:30:14Z)
Chain of Tools: Large Language Model is an Automatic Multi-tool Learner [54.992464510992605]
Automatic Tool Chain (ATC) is a framework that enables the large language models (LLMs) to act as a multi-tool user. To scale up the scope of the tools, we next propose a black-box probing method. For a comprehensive evaluation, we build a challenging benchmark named ToolFlow.
arXiv Detail & Related papers (2024-05-26T11:40:58Z)
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error [54.954211216847135]
Existing large language models (LLMs) only reach a correctness rate in the range of 30% to 60%. We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE) STE orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory.
arXiv Detail & Related papers (2024-03-07T18:50:51Z)
ToolNet: Connecting Large Language Models with Massive Tools via Tool Graph [43.95759808077083]
Existing in-context learning approaches simply format tools into a list of plain text descriptions and input them to large language models. This paper proposes ToolNet, a plug-and-play framework that scales up the number of tools to thousands with a moderate increase in token consumption.
arXiv Detail & Related papers (2024-02-29T02:04:00Z)
EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction [56.02100384015907]
EasyTool is a framework transforming diverse and lengthy tool documentation into a unified and concise tool instruction. It can significantly reduce token consumption and improve the performance of tool utilization in real-world scenarios.
arXiv Detail & Related papers (2024-01-11T15:45:11Z)
MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use [82.24774504584066]
Large language models (LLMs) have garnered significant attention due to their impressive natural language processing (NLP) capabilities. We introduce MetaTool, a benchmark designed to evaluate whether LLMs have tool usage awareness and can correctly choose tools. We conduct experiments involving eight popular LLMs and find that the majority of them still struggle to effectively select tools.
arXiv Detail & Related papers (2023-10-04T19:39:26Z)
Large Language Models as Tool Makers [85.00361145117293]
We introduce a closed-loop framework, referred to as LLMs A s Tool Makers (LATM), where LLMs create their own reusable tools for problem-solving. Our approach consists of two phases: 1) tool making: an LLM acts as the tool maker that crafts tools for a set of tasks. 2) tool using: another LLM acts as the tool user, which applies the tool built by the tool maker for problem-solving.
arXiv Detail & Related papers (2023-05-26T17:50:11Z)
Making Language Models Better Tool Learners with Execution Feedback [36.30542737293863]
Tools serve as pivotal interfaces that enable humans to understand and reshape the environment. Existing tool learning methodologies induce large language models to utilize tools indiscriminately. We propose Tool leaRning wIth exeCution fEedback (TRICE), a two-stage end-to-end framework that enables the model to continually learn through feedback derived from tool execution.
arXiv Detail & Related papers (2023-05-22T14:37:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.