ToolSword: Unveiling Safety Issues of Large Language Models in Tool
Learning Across Three Stages
- URL: http://arxiv.org/abs/2402.10753v1
- Date: Fri, 16 Feb 2024 15:19:46 GMT
- Title: ToolSword: Unveiling Safety Issues of Large Language Models in Tool
Learning Across Three Stages
- Authors: Junjie Ye, Sixian Li, Guanyu Li, Caishuang Huang, Songyang Gao, Yilong
Wu, Qi Zhang, Tao Gui, Xuanjing Huang
- Abstract summary: Tool learning is widely acknowledged as a foundational approach or deploying large language models (LLMs) in real-world scenarios.
$ToolSword$ is a framework dedicated to investigating safety issues linked to LLMs in tool learning.
Experiments conducted on 11 open-source and closed-source LLMs reveal enduring safety challenges in tool learning.
- Score: 46.86723087688694
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tool learning is widely acknowledged as a foundational approach or deploying
large language models (LLMs) in real-world scenarios. While current research
primarily emphasizes leveraging tools to augment LLMs, it frequently neglects
emerging safety considerations tied to their application. To fill this gap, we
present $ToolSword$, a comprehensive framework dedicated to meticulously
investigating safety issues linked to LLMs in tool learning. Specifically,
ToolSword delineates six safety scenarios for LLMs in tool learning,
encompassing $malicious$ $queries$ and $jailbreak$ $attacks$ in the input
stage, $noisy$ $misdirection$ and $risky$ $cues$ in the execution stage, and
$harmful$ $feedback$ and $error$ $conflicts$ in the output stage. Experiments
conducted on 11 open-source and closed-source LLMs reveal enduring safety
challenges in tool learning, such as handling harmful queries, employing risky
tools, and delivering detrimental feedback, which even GPT-4 is susceptible to.
Moreover, we conduct further studies with the aim of fostering research on tool
learning safety. The data is released in
https://github.com/Junjie-Ye/ToolSword.
Related papers
- Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety [18.43622753923107]
Large language models (LLMs) see wider real-world use, understanding and mitigating their unsafe behaviors is critical.<n>We present the first survey that bridges this gap, introducing a unified framework that connects safety-focused interpretation methods, the safety enhancements they inform, and the tools that operationalize them.
arXiv Detail & Related papers (2025-06-05T17:56:05Z) - RRTL: Red Teaming Reasoning Large Language Models in Tool Learning [8.547055998769476]
The safety of newly emerging reasoning LLMs (RLLMs) in the context of tool learning remains underexplored.<n>We propose RRTL, a red teaming approach specifically designed to evaluate RLLMs in tool learning.<n>We conduct a comprehensive evaluation on seven mainstream RLLMs and uncover three key findings.
arXiv Detail & Related papers (2025-05-21T10:21:19Z) - Tool Unlearning for Tool-Augmented LLMs [14.755831733659699]
Tool-augmented large language models (LLMs) are often trained on datasets of query-response pairs.
ToolDelete is the first approach for unlearning tools from tool-augmented LLMs.
arXiv Detail & Related papers (2025-02-03T05:50:55Z) - Learning to Ask: When LLMs Meet Unclear Instruction [49.256630152684764]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone.
We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench.
We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z) - Tool Learning with Large Language Models: A Survey [60.733557487886635]
Tool learning with large language models (LLMs) has emerged as a promising paradigm for augmenting the capabilities of LLMs to tackle highly complex problems.
Despite growing attention and rapid advancements in this field, the existing literature remains fragmented and lacks systematic organization.
arXiv Detail & Related papers (2024-05-28T08:01:26Z) - Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study [1.03590082373586]
We propose using large language models (LLMs) to assist in finding vulnerabilities in source code.
The aim is to test multiple state-of-the-art LLMs and identify the best prompting strategies.
We find that LLMs can pinpoint many more issues than traditional static analysis tools, outperforming traditional tools in terms of recall and F1 scores.
arXiv Detail & Related papers (2024-05-24T14:59:19Z) - LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error [54.954211216847135]
Existing large language models (LLMs) only reach a correctness rate in the range of 30% to 60%.
We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE)
STE orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory.
arXiv Detail & Related papers (2024-03-07T18:50:51Z) - Efficient Tool Use with Chain-of-Abstraction Reasoning [65.18096363216574]
Large language models (LLMs) need to ground their reasoning to real-world knowledge.
There remains challenges for fine-tuning LLM agents to invoke tools in multi-step reasoning problems.
We propose a new method for LLMs to better leverage tools in multi-step reasoning.
arXiv Detail & Related papers (2024-01-30T21:53:30Z) - ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of
Large Language Models in Real-world Scenarios [48.38419686697733]
We propose ToolEyes, a fine-grained system tailored for the evaluation of large language models' tool learning capabilities in authentic scenarios.
The system meticulously examines seven real-world scenarios, analyzing five dimensions crucial to LLMs in tool learning.
ToolEyes incorporates a tool library boasting approximately 600 tools, serving as an intermediary between LLMs and the physical world.
arXiv Detail & Related papers (2024-01-01T12:49:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.