Fugu-MT 論文翻訳(概要): Empirical Study of Code Large Language Models for Binary Security Patch Detection

論文の概要: Empirical Study of Code Large Language Models for Binary Security Patch Detection

arxiv url: http://arxiv.org/abs/2509.06052v1
Date: Sun, 07 Sep 2025 13:31:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-09 14:07:03.835672
Title: Empirical Study of Code Large Language Models for Binary Security Patch Detection
Title（参考訳）: バイナリセキュリティパッチ検出のためのコード大言語モデルの実証的研究
Authors: Qingyuan Li, Binchang Li, Cuiyun Gao, Shuzheng Gao, Zongjie Li,
Abstract要約: セキュリティパッチ検出(SPD)はソフトウェアセキュリティの維持に不可欠である。近年、多くの学習ベースのSPDアプローチがソースコードに有望な結果を示してきた。しかし、これらのアプローチは、現実世界のソフトウェアの大部分を構成するクローズドソースアプリケーションやプロプライエタリシステムには適用できない。
参考スコア（独自算出の注目度）: 12.110226735365643
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Security patch detection (SPD) is crucial for maintaining software security, as unpatched vulnerabilities can lead to severe security risks. In recent years, numerous learning-based SPD approaches have demonstrated promising results on source code. However, these approaches typically cannot be applied to closed-source applications and proprietary systems that constitute a significant portion of real-world software, as they release patches only with binary files, and the source code is inaccessible. Given the impressive performance of code large language models (LLMs) in code intelligence and binary analysis tasks such as decompilation and compilation optimization, their potential for detecting binary security patches remains unexplored, exposing a significant research gap between their demonstrated low-level code understanding capabilities and this critical security task. To address this gap, we construct a large-scale binary patch dataset containing \textbf{19,448} samples, with two levels of representation: assembly code and pseudo-code, and systematically evaluate \textbf{19} code LLMs of varying scales to investigate their capability in binary SPD tasks. Our initial exploration demonstrates that directly prompting vanilla code LLMs struggles to accurately identify security patches from binary patches, and even state-of-the-art prompting techniques fail to mitigate the lack of domain knowledge in binary SPD within vanilla models. Drawing on the initial findings, we further investigate the fine-tuning strategy for injecting binary SPD domain knowledge into code LLMs through two levels of representation. Experimental results demonstrate that fine-tuned LLMs achieve outstanding performance, with the best results obtained on the pseudo-code representation.
Abstract（参考訳）: セキュリティパッチ検出(SPD)はソフトウェアセキュリティの維持に不可欠である。近年、多くの学習ベースのSPDアプローチがソースコードに有望な結果を示してきた。しかし、これらのアプローチは一般に、バイナリファイルのみでパッチをリリースし、ソースコードにアクセスできないため、実際のソフトウェアの大部分を構成するクローズドソースアプリケーションやプロプライエタリなシステムには適用できない。コードインテリジェンスにおけるコード大言語モデル(LLM)のパフォーマンスと、デコンパイルやコンパイル最適化といったバイナリ分析タスクが著しく向上していることを考えると、バイナリセキュリティパッチを検出する可能性はまだ探索されていないままであり、低レベルのコード理解能力とこの重要なセキュリティタスクとの間には、重大な研究ギャップが明らかになっている。このギャップに対処するために、我々は、アセンブリコードと擬似コードという2つのレベルの表現を持つ、 \textbf{19,448}サンプルを含む大規模なバイナリパッチデータセットを構築し、バイナリSPDタスクにおけるそれらの機能を調べるために、様々なスケールの \textbf{19}コードLLMを体系的に評価する。最初の調査では、バニラコードの直接的なプロンプトはバイナリパッチからセキュリティパッチを正確に識別するのに苦労しており、最先端のプロンプト技術でさえ、バニラモデル内のバイナリSPDにおけるドメイン知識の欠如を軽減できないことが示されています。そこで本研究では, 2段階の表現法を用いて, 2段階のSPDドメイン知識をコードLLMに注入するための微調整戦略について検討した。実験により、微調整LDMは、擬似符号表現で得られる最良の結果により、優れた性能を発揮することが示された。

論文の概要: Empirical Study of Code Large Language Models for Binary Security Patch Detection

関連論文リスト