Fugu-MT 論文翻訳(概要): Semantic-Aware Fuzzing: An Empirical Framework for LLM-Guided, Reasoning-Driven Input Mutation

論文の概要: Semantic-Aware Fuzzing: An Empirical Framework for LLM-Guided, Reasoning-Driven Input Mutation

arxiv url: http://arxiv.org/abs/2509.19533v1
Date: Tue, 23 Sep 2025 19:57:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-25 20:53:19.589297
Title: Semantic-Aware Fuzzing: An Empirical Framework for LLM-Guided, Reasoning-Driven Input Mutation
Title（参考訳）: Semantic-Aware Fuzzing: LLM-Guided, Reasoning-Driven Input Mutationのための実証的フレームワーク
Authors: Mengdi Lu, Steven Ding, Furkan Alaca, Philippe Charland,
Abstract要約: インターネット・オブ・Thingsデバイス、モバイル・プラットフォーム、自律システムのセキュリティ上の脆弱性は依然として重要だ。従来の突然変異ベースのファジィザは、主に意味論的推論なしでバイトやビットレベルの編集を行う。本稿では,Google の FuzzBench 上で LLM を AFL++ に統合するオープンソースフレームワークを提案する。
参考スコア（独自算出の注目度）: 0.5336076422485075
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Security vulnerabilities in Internet-of-Things devices, mobile platforms, and autonomous systems remain critical. Traditional mutation-based fuzzers -- while effectively explore code paths -- primarily perform byte- or bit-level edits without semantic reasoning. Coverage-guided tools such as AFL++ use dictionaries, grammars, and splicing heuristics to impose shallow structural constraints, leaving deeper protocol logic, inter-field dependencies, and domain-specific semantics unaddressed. Conversely, reasoning-capable large language models (LLMs) can leverage pretraining knowledge to understand input formats, respect complex constraints, and propose targeted mutations, much like an experienced reverse engineer or testing expert. However, lacking ground truth for "correct" mutation reasoning makes supervised fine-tuning impractical, motivating explorations of off-the-shelf LLMs via prompt-based few-shot learning. To bridge this gap, we present an open-source microservices framework that integrates reasoning LLMs with AFL++ on Google's FuzzBench, tackling asynchronous execution and divergent hardware demands (GPU- vs. CPU-intensive) of LLMs and fuzzers. We evaluate four research questions: (R1) How can reasoning LLMs be integrated into the fuzzing mutation loop? (R2) Do few-shot prompts yield higher-quality mutations than zero-shot? (R3) Can prompt engineering with off-the-shelf models improve fuzzing directly? and (R4) Which open-source reasoning LLMs perform best under prompt-only conditions? Experiments with Llama3.3, Deepseek-r1-Distill-Llama-70B, QwQ-32B, and Gemma3 highlight Deepseek as the most promising. Mutation effectiveness depends more on prompt complexity and model choice than shot count. Response latency and throughput bottlenecks remain key obstacles, offering directions for future work.
Abstract（参考訳）: インターネット・オブ・Thingsデバイス、モバイル・プラットフォーム、自律システムのセキュリティ上の脆弱性は依然として重要だ。従来の突然変異ベースのファジィは、コードパスを効果的に探索する一方で、主に意味論的推論なしでバイトまたはビットレベルの編集を実行する。 AFL++のようなカバレッジ誘導ツールでは、辞書、文法、スプライシングヒューリスティックを使用して浅い構造制約を課し、より深いプロトコルロジック、フィールド間の依存関係、ドメイン固有のセマンティクスを未修正のまま残している。逆に、推論能力を持つ大規模言語モデル(LLM)は、事前学習した知識を活用して入力形式を理解し、複雑な制約を尊重し、経験豊富なリバースエンジニアやテスト専門家のようにターゲットの突然変異を提案する。しかし、「正しい」突然変異推論のための基礎的な真理が欠如しているため、教師による微調整の非現実的であり、即発的な数発の学習を通じて、既成のLSMの探索を動機づける。このギャップを埋めるために、GoogleのFuzzBench上で、LLMとAFL++の推論を統合したオープンソースのマイクロサービスフレームワークを紹介します。 R1) LLMをファジィ突然変異ループに組み込むにはどうすればいいのか? (R2) 数発のプロンプトはゼロショットよりも高品質な突然変異をもたらすか? (R3) オフザシェルフモデルによるエンジニアリングはファジィングを直接改善できるのか? そして (R4) プロンプトのみの条件下で、どのオープンソース推論 LLM が最善を尽くすか? Llama3.3、Deepseek-r1-Distill-Llama-70B、QwQ-32B、Gemma3による実験では、Deepseekが最も有望である。突然変異の有効性は、ショット数よりも、迅速な複雑さとモデル選択に依存する。応答レイテンシとスループットのボトルネックは依然として重要な障害であり、今後の作業の方向性を提供する。

論文の概要: Semantic-Aware Fuzzing: An Empirical Framework for LLM-Guided, Reasoning-Driven Input Mutation

関連論文リスト