Fugu-MT 論文翻訳(概要): Specification-Guided Vulnerability Detection with Large Language Models

論文の概要: Specification-Guided Vulnerability Detection with Large Language Models

arxiv url: http://arxiv.org/abs/2511.04014v1
Date: Thu, 06 Nov 2025 03:21:46 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-07 20:17:53.286327
Title: Specification-Guided Vulnerability Detection with Large Language Models
Title（参考訳）: 大規模言語モデルを用いた仕様ガイドによる脆弱性検出
Authors: Hao Zhu, Jia Li, Cuiyun Gao, Jiaru Qian, Yihong Dong, Huanyu Liu, Lecheng Wang, Ziliang Wang, Xiaolong Hu, Ge Li,
Abstract要約: VulInstructは、過去の脆弱性からセキュリティ仕様を抽出して、新たな脆弱性を検出する仕様誘導型アプローチである。 PrimeVulでは、VulInstructの45.0%のF1スコア(32.7%の改善)と37.7%のリコール(50.8%の改善)がベースラインと比較している。
参考スコア（独自算出の注目度）: 32.77684612568584
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have achieved remarkable progress in code understanding tasks. However, they demonstrate limited performance in vulnerability detection and struggle to distinguish vulnerable code from patched code. We argue that LLMs lack understanding of security specifications -- the expectations about how code should behave to remain safe. When code behavior differs from these expectations, it becomes a potential vulnerability. However, such knowledge is rarely explicit in training data, leaving models unable to reason about security flaws. We propose VulInstruct, a specification-guided approach that systematically extracts security specifications from historical vulnerabilities to detect new ones. VulInstruct constructs a specification knowledge base from two perspectives: (i) General specifications from high-quality patches across projects, capturing fundamental safe behaviors; and (ii) Domain-specific specifications from repeated violations in particular repositories relevant to the target code. VulInstruct retrieves relevant past cases and specifications, enabling LLMs to reason about expected safe behaviors rather than relying on surface patterns. We evaluate VulInstruct under strict criteria requiring both correct predictions and valid reasoning. On PrimeVul, VulInstruct achieves 45.0% F1-score (32.7% improvement) and 37.7% recall (50.8% improvement) compared to baselines, while uniquely detecting 24.3% of vulnerabilities -- 2.4x more than any baseline. In pair-wise evaluation, VulInstruct achieves 32.3% relative improvement. VulInstruct also discovered a previously unknown high-severity vulnerability (CVE-2025-56538) in production code, demonstrating practical value for real-world vulnerability discovery. All code and supplementary materials are available at https://github.com/zhuhaopku/VulInstruct-temp.
Abstract（参考訳）: 大規模言語モデル(LLM)は、コード理解タスクにおいて顕著な進歩を遂げた。しかし、脆弱性検出のパフォーマンスは限られており、脆弱性のあるコードをパッチされたコードと区別するのに苦労している。 LLMにはセキュリティ仕様の理解が欠如している、と私たちは主張する。コード動作がこれらの期待と異なる場合、潜在的な脆弱性となる。しかし、そのような知識がトレーニングデータで明確になることは滅多になく、モデルにセキュリティ上の欠陥を説明できないままである。 VulInstructは,過去の脆弱性からセキュリティ仕様を体系的に抽出して新たな脆弱性を検出する,仕様誘導型アプローチである。 VulInstructは2つの視点から仕様知識ベースを構築する。一プロジェクト全体にわたる高品質のパッチの一般的な仕様、基本的な安全行動の把握 (ii) ターゲットコードに関連する特定のリポジトリにおける繰り返し違反からのドメイン固有の仕様。 VulInstructは関連する過去のケースや仕様を検索し、LLMが表面パターンに頼るのではなく、期待される安全な振る舞いを推論できるようにする。 VulInstructは正確な予測と妥当な推論の両方を必要とする厳格な基準で評価する。 PrimeVulでは、VulInstructはベースラインと比較して45.0%のF1スコア(32.7%の改善)と37.7%のリコール(50.8%の改善)を達成した。対評価では、VulInstructは32.3%の相対的な改善を達成している。 VulInstructはまた、実世界の脆弱性発見の実用的価値を示す、これまで未知の高重度脆弱性(CVE-2025-56538)をプロダクションコードで発見した。すべてのコードと補足資料はhttps://github.com/zhuhaopku/VulInstruct-temp.comで入手できる。

論文の概要: Specification-Guided Vulnerability Detection with Large Language Models

関連論文リスト