Fugu-MT 論文翻訳(概要): Before You Hand Over the Wheel: Evaluating LLMs for Security Incident Analysis

論文の概要: Before You Hand Over the Wheel: Evaluating LLMs for Security Incident Analysis

arxiv url: http://arxiv.org/abs/2603.06422v1
Date: Fri, 06 Mar 2026 15:58:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 13:17:46.188316
Title: Before You Hand Over the Wheel: Evaluating LLMs for Security Incident Analysis
Title（参考訳）: ハンドルを乗り越える前に:セキュリティインシデント分析のためのLLMの評価
Authors: Sourov Jajodia, Madeena Sultana, Suryadipta Majumdar, Adrian Taylor, Grant Vandenberghe,
Abstract要約: セキュリティインシデント分析は、セキュリティ運用センターにとって大きな課題となる。本稿では,セキュリティインシデント分析のためのエージェント評価フレームワークであるSIABENCHを紹介する。
参考スコア（独自算出の注目度）: 1.6786702848693926
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Security incident analysis (SIA) poses a major challenge for security operations centers, which must manage overwhelming alert volumes, large and diverse data sources, complex toolchains, and limited analyst expertise. These difficulties intensify because incidents evolve dynamically and require multi-step, multifaceted reasoning. Although organizations are eager to adopt Large Language Models (LLMs) to support SIA, the absence of rigorous benchmarking creates significant risks for assessing their effectiveness and guiding design decisions. Benchmarking is further complicated by: (i) the lack of an LLM-ready dataset covering a wide spectrum of SIA tasks; (ii) the continual emergence of new tasks reflecting the diversity of analyst responsibilities; and (iii) the rapid release of new LLMs that must be incorporated into evaluations. In this paper, we address these challenges by introducing SIABENCH, an agentic evaluation framework for security incident analysis. First, we construct a first-of-its-kind dataset comprising two major SIA task categories: (i) deep analysis workflows for security incidents (25 scenarios) and (ii) alert-triage tasks (135 scenarios). Second, we implement an agent capable of autonomously performing a broad spectrum of SIA tasks (including network and memory forensics, malware analysis across binary/code/PDF formats, phishing email and kit analysis, log analysis, and false-alert detection). Third, we benchmark 11 major LLMs (spanning both open- and closed-weight models) on these tasks, with extensibility to support emerging models and newly added analysis scenarios.
Abstract（参考訳）: セキュリティインシデント分析(SIA)は、圧倒的な警告ボリューム、大規模で多様なデータソース、複雑なツールチェーン、限られたアナリストの専門知識を管理する必要がある、セキュリティ運用センターにとって大きな課題となる。これらの困難は、インシデントが動的に進化し、多段階の多面的推論を必要とするため、強化される。組織は、SIAをサポートするためにLarge Language Models(LLM)を採用することを熱望していますが、厳格なベンチマークが欠如していることは、その有効性を評価し、設計決定を導く上で大きなリスクをもたらします。ベンチマークはさらに複雑です。 i) 幅広いSIAタスクをカバーするLLM対応データセットの欠如二アナリスト責任の多様性を反映した新たな業務の継続的出現三評価に組み込まなければならない新LDMの迅速リリース本稿では,セキュリティインシデント分析のためのエージェント評価フレームワークであるSIABENCHを導入することで,これらの課題に対処する。まず、SIAタスクの2つの主要なカテゴリからなる第一種データセットを構築する。 (i)セキュリティインシデント(25シナリオ)の詳細な分析ワークフロー (ii)アラートトリアージタスク(135シナリオ)。第2に,SIAタスク(ネットワークとメモリの鑑定,バイナリ/コード/PDFフォーマット間のマルウェア解析,フィッシングメールとキット分析,ログ解析,偽アラート検出など)を自律的に実行可能なエージェントを実装した。第3に、これらのタスクで11の主要なLCM(オープンおよびクローズドウェイトモデルの両方)をベンチマークします。

論文の概要: Before You Hand Over the Wheel: Evaluating LLMs for Security Incident Analysis

関連論文リスト