Fugu-MT 論文翻訳(概要): From IOCs to Regex: Automating CTI Operationalization for SOC with LLMs

論文の概要: From IOCs to Regex: Automating CTI Operationalization for SOC with LLMs

arxiv url: http://arxiv.org/abs/2604.12228v1
Date: Tue, 14 Apr 2026 03:08:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-15 19:11:32.210478
Title: From IOCs to Regex: Automating CTI Operationalization for SOC with LLMs
Title（参考訳）: IOCからレセックスへ:LCMによるSOCのCTI運用の自動化
Authors: Pei-Yu Tseng, Lan Zhang, ZihDwo Yeh, Xiaoyan Sun, Xushu Dai, Peng Liu,
Abstract要約: 本稿では,IOCの指標を正規表現に変換する自動システムであるIOCRegex-genを紹介する。 IOCRegex-genの平均ヒット率は99.1%、偽陽性率は0.8%であり、大規模CTI処理と自動生成の有効性を示す。
参考スコア（独自算出の注目度）: 10.073504563975394
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cyber Threat Intelligence (CTI) reports contain Indicators of Compromise (IOCs) that are critical for security operations. To operationalize these IOCs across heterogeneous logs, analysts often convert them into regular expressions (regexes) for tasks such as digital forensics, log parsing, and SIEM rule creation. However, regex construction is still largely manual, requiring analysts to extract IOCs from CTI reports and transform them into syntactically valid and semantically precise patterns. This process is slow, error-prone, and increasingly impractical as CTI volumes grow. Although recent studies have applied Large Language Models (LLMs) to IOC extraction, they typically output plain strings rather than regexes, limiting practical deployment. Plain IOCs cannot effectively capture variations in system context, log format, or attacker behavior. To address this gap, we propose IOCRegex-gen, a fully automated LLM-based regex generation system that converts IOCs into regexes. The system introduces two key innovations: (i) a group-aware mechanism that identifies which IOC segments should be represented as capture or non-capture groups, and (ii) an iterative reasoning and multi-stage validation pipeline to ensure syntactic validity and semantic correctness. Experiments on over 3,000 real CTI reports and 2,400 ground-truth strings from the MITRE ATT&CK Evaluation framework show that IOCRegex-gen achieves an average hit rate of 99.1% and a false-positive rate of only 0.8%, demonstrating its effectiveness for large-scale CTI processing and automated regex generation.
Abstract（参考訳）: サイバー脅威インテリジェンス(CTI)の報告には、セキュリティ運用に不可欠なIOC(Indicators of Compromise)が含まれている。これらのIOCを不均一なログで運用するために、アナリストはしばしばそれらをデジタル法医学、ログ解析、SIEMルール作成などのタスクの正規表現(レジェックス)に変換する。しかし、Regexの構築は依然として手作業であり、アナリストはCTIレポートからIOCを抽出し、それらを構文的に有効で意味論的に正確なパターンに変換する必要がある。このプロセスは遅く、エラーが発生し、CTIボリュームが増加するにつれて、ますます非現実的になる。近年の研究では、IOC抽出にLarge Language Models (LLMs) を適用しているが、典型的にはregexesではなくプレーン文字列を出力し、実際の展開を制限している。通常のIOCでは、システムコンテキストやログフォーマット、攻撃行動のバリエーションを効果的にキャプチャすることはできない。このギャップに対処するため,IOCをレゲックスに変換するLLMベースの完全自動化されたレゲックス生成システムであるIOCRegex-genを提案する。システムには2つの重要なイノベーションが導入されている。 i)IOCのどのセグメントをキャプチャーまたは非キャプチャーグループとして表現すべきかを識別するグループ認識機構 (II)構文的妥当性と意味的正当性を確保するための反復的推論および多段階検証パイプライン。 MITRE ATT&CK評価フレームワークの3000以上の実CTIレポートと2,400本の接地木を実験した結果、IOCRegex-genは平均ヒット率99.1%、偽陽性率はわずか0.8%に達し、大規模なCTI処理と自動リジェクス生成に有効であることが示されている。

論文の概要: From IOCs to Regex: Automating CTI Operationalization for SOC with LLMs

関連論文リスト