Related papers: ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages

ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages

URL: http://arxiv.org/abs/2402.10753v2
Date: Fri, 16 Aug 2024 04:12:00 GMT
Title: ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages
Authors: Junjie Ye, Sixian Li, Guanyu Li, Caishuang Huang, Songyang Gao, Yilong Wu, Qi Zhang, Tao Gui, Xuanjing Huang,
Abstract summary: Tool learning is widely acknowledged as a foundational approach or deploying large language models (LLMs) in real-world scenarios. To fill this gap, we present *ToolSword*, a comprehensive framework dedicated to investigating safety issues linked to LLMs in tool learning.
Score: 45.16862486631841
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tool learning is widely acknowledged as a foundational approach or deploying large language models (LLMs) in real-world scenarios. While current research primarily emphasizes leveraging tools to augment LLMs, it frequently neglects emerging safety considerations tied to their application. To fill this gap, we present *ToolSword*, a comprehensive framework dedicated to meticulously investigating safety issues linked to LLMs in tool learning. Specifically, ToolSword delineates six safety scenarios for LLMs in tool learning, encompassing **malicious queries** and **jailbreak attacks** in the input stage, **noisy misdirection** and **risky cues** in the execution stage, and **harmful feedback** and **error conflicts** in the output stage. Experiments conducted on 11 open-source and closed-source LLMs reveal enduring safety challenges in tool learning, such as handling harmful queries, employing risky tools, and delivering detrimental feedback, which even GPT-4 is susceptible to. Moreover, we conduct further studies with the aim of fostering research on tool learning safety. The data is released in https://github.com/Junjie-Ye/ToolSword.

Related papers

Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety [18.43622753923107]
Large language models (LLMs) see wider real-world use, understanding and mitigating their unsafe behaviors is critical.<n>We present the first survey that bridges this gap, introducing a unified framework that connects safety-focused interpretation methods, the safety enhancements they inform, and the tools that operationalize them.
arXiv Detail & Related papers (2025-06-05T17:56:05Z)
RRTL: Red Teaming Reasoning Large Language Models in Tool Learning [8.547055998769476]
The safety of newly emerging reasoning LLMs (RLLMs) in the context of tool learning remains underexplored.<n>We propose RRTL, a red teaming approach specifically designed to evaluate RLLMs in tool learning.<n>We conduct a comprehensive evaluation on seven mainstream RLLMs and uncover three key findings.
arXiv Detail & Related papers (2025-05-21T10:21:19Z)
Tool Unlearning for Tool-Augmented LLMs [14.755831733659699]
Tool-augmented large language models (LLMs) are often trained on datasets of query-response pairs. ToolDelete is the first approach for unlearning tools from tool-augmented LLMs.
arXiv Detail & Related papers (2025-02-03T05:50:55Z)
Learning to Ask: When LLMs Meet Unclear Instruction [49.256630152684764]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone. We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench. We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z)
Tool Learning with Large Language Models: A Survey [60.733557487886635]
Tool learning with large language models (LLMs) has emerged as a promising paradigm for augmenting the capabilities of LLMs to tackle highly complex problems. Despite growing attention and rapid advancements in this field, the existing literature remains fragmented and lacks systematic organization.
arXiv Detail & Related papers (2024-05-28T08:01:26Z)
Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study [1.03590082373586]
We propose using large language models (LLMs) to assist in finding vulnerabilities in source code. The aim is to test multiple state-of-the-art LLMs and identify the best prompting strategies. We find that LLMs can pinpoint many more issues than traditional static analysis tools, outperforming traditional tools in terms of recall and F1 scores.
arXiv Detail & Related papers (2024-05-24T14:59:19Z)
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error [54.954211216847135]
Existing large language models (LLMs) only reach a correctness rate in the range of 30% to 60%. We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE) STE orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory.
arXiv Detail & Related papers (2024-03-07T18:50:51Z)
Efficient Tool Use with Chain-of-Abstraction Reasoning [65.18096363216574]
Large language models (LLMs) need to ground their reasoning to real-world knowledge. There remains challenges for fine-tuning LLM agents to invoke tools in multi-step reasoning problems. We propose a new method for LLMs to better leverage tools in multi-step reasoning.
arXiv Detail & Related papers (2024-01-30T21:53:30Z)
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios [48.38419686697733]
We propose ToolEyes, a fine-grained system tailored for the evaluation of large language models' tool learning capabilities in authentic scenarios. The system meticulously examines seven real-world scenarios, analyzing five dimensions crucial to LLMs in tool learning. ToolEyes incorporates a tool library boasting approximately 600 tools, serving as an intermediary between LLMs and the physical world.
arXiv Detail & Related papers (2024-01-01T12:49:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.