Fugu-MT 論文翻訳(概要): ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks?

論文の概要: ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks?

arxiv url: http://arxiv.org/abs/2605.11086v1
Date: Mon, 11 May 2026 18:00:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 21:48:56.340068
Title: ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks?
Title（参考訳）: ExploitGym:AIエージェントはセキュリティの脆弱性を本当の攻撃に変えられるか?
Authors: Zhun Wang, Nico Schiller, Hongwei Li, Srijiith Sesha Narayana, Milad Nasr, Nicholas Carlini, Xiangyu Qi, Eric Wallace, Elie Bursztein, Luca Invernizzi, Kurt Thomas, Yan Shoshitaishvili, Wenbo Guo, Jingxuan He, Thorsten Holz, Dawn Song,
Abstract要約: 低レベルのプログラム推論を必要とするため、爆発は難しい作業です。その重要性と診断価値にもかかわらず、搾取は未評価のままである。 ExploitGymは、AIエージェントのエクスプロイト能力に関する大規模で多様な、現実的なベンチマークである。
参考スコア（独自算出の注目度）: 92.21756459993695
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: AI agents are rapidly gaining capabilities that could significantly reshape cybersecurity, making rigorous evaluation urgent. A critical capability is exploitation: turning a vulnerability, which is not yet an attack, into a concrete security impact, such as unauthorized file access or code execution. Exploitation is a particularly challenging task because it requires low-level program reasoning (e.g., about memory layout), runtime adaptation, and sustained progress over long horizons. Meanwhile, it is inherently dual-use, supporting defensive workflows while lowering the barrier for offense. Despite its importance and diagnostic value, exploitation remains under-evaluated. To address this gap, we introduce ExploitGym, a large-scale, diverse, realistic benchmark on the exploitation capabilities of AI agents. Given a program input that triggers a vulnerability, ExploitGym tasks agents with progressively extending it into a working exploit. The benchmark comprises 898 instances sourced from real-world vulnerabilities across three domains, including userspace programs, Google's V8 JavaScript engine, and the Linux kernel. We vary the security protections applied to each instance, isolating their impact on agent performance. All configurations are packaged in reproducible containerized environments. Our evaluation shows that while exploitation remains challenging, frontier models can successfully exploit a non-trivial fraction of vulnerabilities. For example, the strongest configurations are Anthropic's latest model Claude Mythos Preview and OpenAI's GPT-5.5, which produce working exploits for 157 and 120 instances, respectively. Notably, even with widely used defenses enabled, models retain non-trivial success rates. These results establish ExploitGym as an effective testbed for exploitation and highlight the growing cybersecurity risks posed by increasingly capable AI agents.
Abstract（参考訳）: AIエージェントは、サイバーセキュリティを大幅に作り直す能力が急速に向上し、厳格な評価が緊急になっている。重要な機能は、攻撃ではない脆弱性を、不正なファイルアクセスやコード実行など、具体的なセキュリティインパクトに変えることである。エクスプロイテーションは、低レベルのプログラム推論(例えば、メモリレイアウトについて)、実行時適応、長い地平線上での継続的な進捗を必要とするため、特に難しいタスクである。一方、これは本質的に二重利用であり、攻撃の障壁を低くしながら防御ワークフローをサポートする。その重要性と診断価値にもかかわらず、搾取は未評価のままである。このギャップを解決するために、AIエージェントのエクスプロイト能力に関する大規模で多様で現実的なベンチマークであるExploitGymを紹介します。脆弱性をトリガーするプログラムインプットが与えられたとき、ExploitGymはエージェントを処理し、それを段階的にワーキングエクスプロイトに拡張する。このベンチマークは、ユーザスペースプログラム、GoogleのV8 JavaScriptエンジン、Linuxカーネルを含む3つのドメインにわたる現実世界の脆弱性に由来する898のインスタンスで構成されている。私たちは各インスタンスに適用されるセキュリティ保護を変更し、エージェントのパフォーマンスへの影響を分離します。すべての構成は再現可能なコンテナ環境にパッケージ化される。我々の評価では、エクスプロイトは依然として困難なままだが、フロンティアモデルは非自明な少数の脆弱性をうまく活用できる。例えば、Anthropicの最新モデルであるClaude Mythos PreviewとOpenAIのGPT-5.5は157インスタンスと120インスタンスでそれぞれ動作可能なエクスプロイトを生成する。特に、広く使用されている防衛が有効であるとしても、モデルは非自明な成功率を維持する。これらの結果は、エクスプロイトGymを搾取のための効果的なテストベッドとして確立し、ますます有能なAIエージェントによって引き起こされるサイバーセキュリティリスクの増大を強調している。

論文の概要: ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks?

関連論文リスト