Fugu-MT 論文翻訳(概要): SafeToolBench: Pioneering a Prospective Benchmark to Evaluating Tool Utilization Safety in LLMs

論文の概要: SafeToolBench: Pioneering a Prospective Benchmark to Evaluating Tool Utilization Safety in LLMs

arxiv url: http://arxiv.org/abs/2509.07315v1
Date: Tue, 09 Sep 2025 01:31:25 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-10 14:38:27.160378
Title: SafeToolBench: Pioneering a Prospective Benchmark to Evaluating Tool Utilization Safety in LLMs
Title（参考訳）: SafeToolBench: LLMにおけるツール利用の安全性評価のための先進的なベンチマークのパイオニア化
Authors: Hongfei Xia, Hongru Wang, Zeming Liu, Qian Yu, Yuhang Guo, Haifeng Wang,
Abstract要約: 大規模言語モデル(LLM)は、外部環境において様々なツールを自律的に呼び出す上で、優れたパフォーマンスを示している。本稿では, LLMツール利用の安全性を評価するために, ツールを直接実行することによって生じる不可逆的な害を避けることを目的としている。ツール利用セキュリティを総合的に評価する最初のベンチマークであるSafeToolBenchを提案する。ツール利用セキュリティに対するLCMの認識を3つの観点から向上することを目的とした,新しいフレームワークであるSafeInstructToolも提案する。
参考スコア（独自算出の注目度）: 35.180946816997164
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have exhibited great performance in autonomously calling various tools in external environments, leading to better problem solving and task automation capabilities. However, these external tools also amplify potential risks such as financial loss or privacy leakage with ambiguous or malicious user instructions. Compared to previous studies, which mainly assess the safety awareness of LLMs after obtaining the tool execution results (i.e., retrospective evaluation), this paper focuses on prospective ways to assess the safety of LLM tool utilization, aiming to avoid irreversible harm caused by directly executing tools. To this end, we propose SafeToolBench, the first benchmark to comprehensively assess tool utilization security in a prospective manner, covering malicious user instructions and diverse practical toolsets. Additionally, we propose a novel framework, SafeInstructTool, which aims to enhance LLMs' awareness of tool utilization security from three perspectives (i.e., \textit{User Instruction, Tool Itself, and Joint Instruction-Tool}), leading to nine detailed dimensions in total. We experiment with four LLMs using different methods, revealing that existing approaches fail to capture all risks in tool utilization. In contrast, our framework significantly enhances LLMs' self-awareness, enabling a more safe and trustworthy tool utilization.
Abstract（参考訳）: 大規模言語モデル(LLM)は、外部環境において様々なツールを自律的に呼び出すことで、優れた問題解決とタスク自動化機能を実現している。しかし、これらの外部ツールは、財務損失やプライバシー漏洩などの潜在的なリスクを曖昧または悪意のあるユーザー指示で増幅する。ツール実行結果(つまり、振り返り評価)を得た後、LSMの安全性を主に評価する以前の研究と比較して、本研究は、ツールを直接実行することによって生じる不可逆的な害を避けることを目的とした、LCMツール利用の安全性を評価するための先進的な方法に焦点を当てた。この目的のためにSafeToolBenchを提案する。SafeToolBenchは,悪意のあるユーザインストラクションと多様な実用ツールセットを網羅して,ツール利用のセキュリティを包括的に評価する最初のベンチマークである。さらに, LLM のツール利用セキュリティに対する意識を高めることを目的とした新しいフレームワーク SafeInstructTool を提案する。異なる手法を用いて4つのLCM実験を行い、既存のアプローチがツール利用におけるすべてのリスクを捕捉できないことを明らかにした。対照的に、我々のフレームワークはLSMの自己認識を著しく向上させ、より安全で信頼性の高いツール利用を可能にします。

論文の概要: SafeToolBench: Pioneering a Prospective Benchmark to Evaluating Tool Utilization Safety in LLMs

関連論文リスト