Fugu-MT 論文翻訳(概要): The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code

論文の概要: The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code

arxiv url: http://arxiv.org/abs/2605.03619v1
Date: Tue, 05 May 2026 10:44:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-06 19:35:43.900692
Title: The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code
Title（参考訳）: 無限変異エンジン : LLM生成攻撃コードにおける多型の測定
Authors: Gabriel Hortea, Juan Tapiador,
Abstract要約: 商用モデルの多型容量を自動マルウェア発生器として測定する。機能要件のみを規定するプロンプトと、事前結果の構造的履歴を注入して分散を強制するプロンプトの2つの設定でペイロードを生成します。
参考スコア（独自算出の注目度）: 0.6359663723794672
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Malware authors have traditionally relied on polymorphic techniques to produce variants in the same malware family, complicating signature-based detection. Integrating generative AI into offensive toolchains enables attackers to synthesize structurally diverse payloads with identical behavior, raising the question of how much polymorphism LLMs provide. Recent work has assumed that LLMs can produce sufficiently polymorphic payloads, leaving unquantified the variation that emerges when an attacker repeatedly builds the same payload, or explicitly instructs the model to avoid prior implementations. In this work, we measure the polymorphic capacity of a commercial model (Claude Opus 4.6) as an automated malware generator. We build a dual-agent, four-stage pipeline that generates, tests, and refines a data-exfiltration payload comprising file traversal, encryption, exfiltration, and integration. We produce payloads in two settings: using prompts that specify only functional requirements, and using prompts that inject a structured history of prior outcomes to force divergence. We measure pairwise distances along structural (AST) and semantic (embedding) axes, finding that when polymorphism is not explicitly required, structural distances are high while semantic distances remain low; i.e., implementations diverge widely without changing high-level behavior. Explicit prompting substantially amplifies this structural diversity while preserving correctness, at the cost of roughly 5 times more tokens but only a small increase in LLM calls (from $4.2$ to $4.5$ per payload, with effective API costs of \$0.41 and \$0.73). These results show that a single commercial LLM can cheaply generate large populations of behaviorally equivalent yet structurally diverse payloads, facilitating the evasion of signature-based detection rules and similarity-based clustering.
Abstract（参考訳）: マルウェアの作者は伝統的に、署名に基づく検出を複雑にし、同じマルウェアファミリーで変種を生成するために多型技術に依存してきた。生成AIを攻撃ツールチェーンに統合することで、攻撃者は構造的に多様なペイロードを同一の振る舞いで合成することができる。最近の研究は、LLMが十分な多型ペイロードを生成できると仮定しており、攻撃者が同じペイロードを何度もビルドしたり、あるいは以前の実装を避けるよう明示的にモデルに指示したりするときに発生する変動を、未定のまま残している。本研究では,商用モデルの多型容量(Claude Opus 4.6)を自動マルウェア生成器として測定する。我々は、ファイルトラバーサル、暗号化、消去、統合を含むデータ抽出ペイロードを生成し、テストし、洗練する、二重エージェントの4段階パイプラインを構築します。機能要件のみを規定するプロンプトと、事前結果の構造的履歴を注入して分散を強制するプロンプトの2つの設定でペイロードを生成します。構造的(AST)と意味的(埋め込み)軸に沿ったペアワイズ距離を測定し、多型が明示的に要求されない場合、構造的距離は高く、意味的距離は低い。明示的なプロンプトは、この構造的多様性を著しく増幅し、正確さを保ちながら、トークンの約5倍のコストで、LCM呼び出しの増加はわずかである(ペイロードあたり4.2ドルから4.5ドル、効果的なAPIコストは0.41ドルと0.73ドル)。これらの結果から,単一商用LCMは,動作に等価だが構造的に多様なペイロードを安価に生成し,シグネチャベースの検出ルールや類似性に基づくクラスタリングを回避できることが示唆された。

論文の概要: The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code

関連論文リスト