Fugu-MT 論文翻訳(概要): CyberSleuth: Autonomous Blue-Team LLM Agent for Web Attack Forensics

論文の概要: CyberSleuth: Autonomous Blue-Team LLM Agent for Web Attack Forensics

arxiv url: http://arxiv.org/abs/2508.20643v1
Date: Thu, 28 Aug 2025 10:45:31 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-29 18:12:02.340722
Title: CyberSleuth: Autonomous Blue-Team LLM Agent for Web Attack Forensics
Title（参考訳）: CyberSleuth: Web攻撃鑑定のための自律的なBlue-Team LLMエージェント
Authors: Stefano Fumero, Kai Huang, Matteo Boffa, Danilo Giordano, Marco Mellia, Zied Ben Houidi, Dario Rossi,
Abstract要約: 大きな言語モデル(LLM)エージェントは複雑なタスクを自動化する強力なツールである。本研究は,現実的なWebアプリケーション攻撃の法医学的調査のためのLLMエージェント設計の体系的研究である。我々は,パケットレベルのトレースとアプリケーションログを処理する自律エージェントCyberSleuthを提案する。
参考スコア（独自算出の注目度）: 6.749559613197707
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large Language Model (LLM) agents are powerful tools for automating complex tasks. In cybersecurity, researchers have primarily explored their use in red-team operations such as vulnerability discovery and penetration tests. Defensive uses for incident response and forensics have received comparatively less attention and remain at an early stage. This work presents a systematic study of LLM-agent design for the forensic investigation of realistic web application attacks. We propose CyberSleuth, an autonomous agent that processes packet-level traces and application logs to identify the targeted service, the exploited vulnerability (CVE), and attack success. We evaluate the consequences of core design decisions - spanning tool integration and agent architecture - and provide interpretable guidance for practitioners. We benchmark four agent architectures and six LLM backends on 20 incident scenarios of increasing complexity, identifying CyberSleuth as the best-performing design. In a separate set of 10 incidents from 2025, CyberSleuth correctly identifies the exact CVE in 80% of cases. At last, we conduct a human study with 22 experts, which rated the reports of CyberSleuth as complete, useful, and coherent. They also expressed a slight preference for DeepSeek R1, a good news for open source LLM. To foster progress in defensive LLM research, we release both our benchmark and the CyberSleuth platform as a foundation for fair, reproducible evaluation of forensic agents.
Abstract（参考訳）: 大きな言語モデル(LLM)エージェントは複雑なタスクを自動化する強力なツールである。サイバーセキュリティにおいて、研究者は主に脆弱性発見や侵入テストのようなレッドチームでの運用について研究してきた。インシデント対応と法医学に対する防御的使用は、比較的注意を引いており、初期段階に留まっている。本研究は,現実的なWebアプリケーション攻撃の法医学的調査のためのLLMエージェント設計の体系的研究である。我々は,パケットレベルのトレースとアプリケーションログを処理する自律エージェントCyberSleuthを提案する。私たちは、ツール統合とエージェントアーキテクチャにまたがる中核設計決定の結果を評価し、実践者に対して解釈可能なガイダンスを提供します。 4つのエージェントアーキテクチャと6つのLDMバックエンドを20のインシデントシナリオでベンチマークし、CyberSleuthを最高のパフォーマンス設計とみなした。 2025年からの10件のインシデントで、CyberSleuthは80%のケースで正確なCVEを正しく特定した。最後に、22人の専門家による人間による研究を行い、CyberSleuthの報告を完全で有用で一貫性のあるものと評価した。彼らはまた、オープンソースのLLMにとって良いニュースであるDeepSeek R1を少し好んだ。防衛LDM研究の進展を促進するため,我々のベンチマークとCyberSleuthプラットフォームを公正かつ再現可能な法定エージェント評価の基礎として公開する。

論文の概要: CyberSleuth: Autonomous Blue-Team LLM Agent for Web Attack Forensics

関連論文リスト