Fugu-MT 論文翻訳(概要): Guided Reasoning in LLM-Driven Penetration Testing Using Structured Attack Trees

論文の概要: Guided Reasoning in LLM-Driven Penetration Testing Using Structured Attack Trees

arxiv url: http://arxiv.org/abs/2509.07939v1
Date: Tue, 09 Sep 2025 17:19:33 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-10 14:38:27.420154
Title: Guided Reasoning in LLM-Driven Penetration Testing Using Structured Attack Trees
Title（参考訳）: 構造的攻撃木を用いたLLM駆動貫入試験における誘導推論
Authors: Katsuaki Nakano, Reza Feyyazi, Shanchieh Jay Yang, Michael Zuzak,
Abstract要約: サイバーセキュリティの侵入テストのための既存のLarge Language Models (LLMs) は自己誘導推論に依存している。我々は,MITRE ATT&CK Matrix から構築された決定論的タスクツリーを組み込んだ LLM エージェントの侵入試験のためのガイド付き推論パイプラインを提案する。 Llama-3-8B, Gemini-1.5, GPT-4を用いてLLMを71.8%, 72.8%, 78.6%のサブタスクに誘導した。
参考スコア（独自算出の注目度）: 1.2397617816774036
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in Large Language Models (LLMs) have driven interest in automating cybersecurity penetration testing workflows, offering the promise of faster and more consistent vulnerability assessment for enterprise systems. Existing LLM agents for penetration testing primarily rely on self-guided reasoning, which can produce inaccurate or hallucinated procedural steps. As a result, the LLM agent may undertake unproductive actions, such as exploiting unused software libraries or generating cyclical responses that repeat prior tactics. In this work, we propose a guided reasoning pipeline for penetration testing LLM agents that incorporates a deterministic task tree built from the MITRE ATT&CK Matrix, a proven penetration testing kll chain, to constrain the LLM's reaoning process to explicitly defined tactics, techniques, and procedures. This anchors reasoning in proven penetration testing methodologies and filters out ineffective actions by guiding the agent towards more productive attack procedures. To evaluate our approach, we built an automated penetration testing LLM agent using three LLMs (Llama-3-8B, Gemini-1.5, and GPT-4) and applied it to navigate 10 HackTheBox cybersecurity exercises with 103 discrete subtasks representing real-world cyberattack scenarios. Our proposed reasoning pipeline guided the LLM agent through 71.8\%, 72.8\%, and 78.6\% of subtasks using Llama-3-8B, Gemini-1.5, and GPT-4, respectively. Comparatively, the state-of-the-art LLM penetration testing tool using self-guided reasoning completed only 13.5\%, 16.5\%, and 75.7\% of subtasks and required 86.2\%, 118.7\%, and 205.9\% more model queries. This suggests that incorporating a deterministic task tree into LLM reasoning pipelines can enhance the accuracy and efficiency of automated cybersecurity assessments
Abstract（参考訳）: 大規模言語モデル(LLM)の最近の進歩は、サイバーセキュリティの侵入テストワークフローの自動化への関心を喚起し、エンタープライズシステムに対するより高速で一貫性のある脆弱性評価を約束している。既存のLLMエージェントの浸透試験は、主に自己誘導推論に依存しており、不正確なまたは幻覚的な手続きステップを発生させることができる。結果として、LLMエージェントは、未使用のソフトウェアライブラリを利用したり、以前の戦術を繰り返す循環応答を発生させるなど、非生産的な行動をとることができる。本研究では, LLM の試行過程を, 戦略, 技術, 手順を明確に定義するために, MITRE ATT&CK Matrix から構築した決定論的タスクツリーを組み込んだ LLM エージェントの侵入試験用ガイド推論パイプラインを提案する。これは、証明された侵入試験手法の推論をアンカーし、より生産的な攻撃手順に向けてエージェントを誘導することで、非効果的なアクションをフィルタリングする。 LLM(Llama-3-8B, Gemini-1.5, GPT-4)を用いた自動浸透試験LSMエージェントを構築し,実際のサイバー攻撃シナリオを表す103個の個別サブタスクを用いたHackTheBoxサイバーセキュリティ演習を行った。 Llama-3-8B, Gemini-1.5, GPT-4を用いて, LLMを71.8\%, 72.8\%, 78.6\%のサブタスクに誘導した。対照的に、自己誘導推論を用いた最先端のLSM浸透試験ツールは、サブタスクの13.5\%、16.5\%、75.7\%しか完成せず、86.2\%、118.7\%、205.9\%以上のモデルクエリを必要とした。これは、LCM推論パイプラインに決定論的タスクツリーを組み込むことで、自動サイバーセキュリティアセスメントの正確性と効率を高めることを示唆している。

論文の概要: Guided Reasoning in LLM-Driven Penetration Testing Using Structured Attack Trees

関連論文リスト