Fugu-MT 論文翻訳(概要): One Bug, Hundreds Behind: LLMs for Large-Scale Bug Discovery

論文の概要: One Bug, Hundreds Behind: LLMs for Large-Scale Bug Discovery

arxiv url: http://arxiv.org/abs/2510.14036v1
Date: Wed, 15 Oct 2025 19:18:06 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-17 21:15:14.593987
Title: One Bug, Hundreds Behind: LLMs for Large-Scale Bug Discovery
Title（参考訳）: 1つのバグ、数百のバグ:大規模バグ発見のためのLLM
Authors: Qiushi Wu, Yue Xiao, Dhilung Kirat, Kevin Eykholt, Jiyong Jang, Douglas Lee Schales,
Abstract要約: Recurring Pattern Bugs (RPB) はプログラムの様々なコードセグメントにまたがって繰り返し現れる。 RPBは広く普及しており、ソフトウェアプログラムのセキュリティを著しく損なう可能性がある。本稿では,LLVMとLarge Language Model (LLM)によるプログラム解析システムであるBugStoneを紹介する。
参考スコア（独自算出の注目度）: 11.169105079732864
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fixing bugs in large programs is a challenging task that demands substantial time and effort. Once a bug is found, it is reported to the project maintainers, who work with the reporter to fix it and eventually close the issue. However, across the program, there are often similar code segments, which may also contain the bug, but were missed during discovery. Finding and fixing each recurring bug instance individually is labor intensive. Even more concerning, bug reports can inadvertently widen the attack surface as they provide attackers with an exploitable pattern that may be unresolved in other parts of the program. In this paper, we explore these Recurring Pattern Bugs (RPBs) that appear repeatedly across various code segments of a program or even in different programs, stemming from a same root cause, but are unresolved. Our investigation reveals that RPBs are widespread and can significantly compromise the security of software programs. This paper introduces BugStone, a program analysis system empowered by LLVM and a Large Language Model (LLM). The key observation is that many RPBs have one patched instance, which can be leveraged to identify a consistent error pattern, such as a specific API misuse. By examining the entire program for this pattern, it is possible to identify similar sections of code that may be vulnerable. Starting with 135 unique RPBs, BugStone identified more than 22K new potential issues in the Linux kernel. Manual analysis of 400 of these findings confirmed that 246 were valid. We also created a dataset from over 1.9K security bugs reported by 23 recent top-tier conference works. We manually annotate the dataset, identify 80 recurring patterns and 850 corresponding fixes. Even with a cost-efficient model choice, BugStone achieved 92.2% precision and 79.1% pairwise accuracy on the dataset.
Abstract（参考訳）: 大きなプログラムでバグを修正するのは、かなりの時間と労力を要する難しい作業です。バグが見つかったら、プロジェクトのメンテナに報告され、リポーターと協力して修正し、最終的に問題は解決する。しかし、プログラム全体では、しばしば同様のコードセグメントがあり、バグも含んでいるが、発見時に見逃された。繰り返し発生する各バグインスタンスを個別に見つけて修正することは、労働集約的です。さらに、バグレポートは攻撃者に対してプログラムの他の部分には未解決の悪用可能なパターンを提供するため、攻撃面を不注意に広げる可能性がある。本稿では,プログラムの様々なコードセグメントにまたがって繰り返し現れるRecurring Pattern Bugs(RPB)について検討する。我々の調査によると、PBは広く、ソフトウェアプログラムのセキュリティを著しく損なう可能性がある。本稿では,LLVMとLarge Language Model (LLM)によるプログラム解析システムであるBugStoneを紹介する。鍵となる観察は、多くのRCBが1つのパッチされたインスタンスを持ち、特定のAPI誤用のような一貫したエラーパターンを特定するために利用することができることである。このパターンのプログラム全体を調べることで、脆弱性のあるコードセクションを識別することができる。 135個のRPBから始めて、BugStoneはLinuxカーネルの22万以上の潜在的な問題を特定した。これらの結果の400件のマニュアル分析により,246件が有効であることが確認された。私たちはまた、最近23のトップレベルのカンファレンスワークによって報告された1.9K以上のセキュリティバグからデータセットを作成しました。データセットを手動でアノテートし、80の繰り返しパターンと850の修正を識別します。コスト効率のよいモデル選択であっても、BugStoneは92.2%の精度と79.1%のペアの精度を達成した。

論文の概要: One Bug, Hundreds Behind: LLMs for Large-Scale Bug Discovery

関連論文リスト