Fugu-MT 論文翻訳(概要): The Emergence of Autonomous Penetration Capabilities in Large Language Model-Powered AI Systems

論文の概要: The Emergence of Autonomous Penetration Capabilities in Large Language Model-Powered AI Systems

arxiv url: http://arxiv.org/abs/2606.13079v1
Date: Thu, 11 Jun 2026 09:02:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-12 15:55:27.687286
Title: The Emergence of Autonomous Penetration Capabilities in Large Language Model-Powered AI Systems
Title（参考訳）: 大規模言語モデル駆動型AIシステムにおける自律的浸透能力の出現
Authors: Jiaqi Luo, Jiarun Dai, Zhile Chen, Jia Xu, Weibing Wang, Yawen Duan, Brian Tse, Geng Hong, Xudong Pan, Yuan Zhang, Min Yang,
Abstract要約: AIシステムの自律的な侵入能力を評価するために、ますます多くの研究が進められている。ターゲットサーバとエージェントスキャフォールディングの2つのコンポーネントで構成される,新たな自律的浸透評価フレームワークを構築した。現在のモデルでは、浸透率は10.7%から69.3%に達する。
参考スコア（独自算出の注目度）: 21.83197937022436
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Nowadays, the autonomous execution of cyberattacks capable of causing substantial real-world harm is widely regarded as one of the critical red lines that frontier AI systems must not cross. Within this broader red-line scenario, autonomous penetration represents a core enabling capability and subtask: the ability of LLM-powered AI systems to independently conduct adversarial operations against a target server without human intervention, identify and exploit vulnerabilities, and obtain unauthorized access or control. A growing body of work has sought to assess the autonomous penetration capabilities of AI systems. However, existing evaluations often employ opaque methodologies, rely on unrealistic or overly simplified penetration-testing scenarios, or provide LLMs with excessive prior knowledge and task-specific guidance, and cannot accurately capture the extent to which modern AI systems can autonomously perform this core capability within broader high-impact cyberattack scenarios. To address these limitations, we construct a new autonomous penetration evaluation framework consisting of two components: target servers and agent scaffolding. Specifically, on the target-server side, we design two levels of target environments based on the number of secure services without known vulnerabilities deployed alongside a vulnerable service: Tier~1 (one secure service) and Tier~2 (three secure services), resulting in a total of 300 target servers. Meanwhile, the agent scaffolding adopts a general-purpose agent architecture equipped with a set of general-purpose cybersecurity tools, without any target-specific prior knowledge. We evaluate 19 open-weight and proprietary LLMs, and find that current models achieve penetration success rates ranging from 10.7% to 69.3%. Moreover, we observe that autonomous penetration capability continues to improve alongside advances in overall model capability.
Abstract（参考訳）: 今日では、大規模な現実世界の害をもたらすサイバー攻撃を自律的に実行することは、フロンティアAIシステムが交差してはいけない重要な赤線の一つとして広く見なされている。この広範なレッドラインのシナリオの中で、自律的な侵入は、中核的な能力とサブタスクを表しています。LLMベースのAIシステムが、人間の介入なしにターゲットサーバに対して独立して敵の操作を実行し、脆弱性を特定し、悪用し、不正なアクセスや制御を得る能力です。 AIシステムの自律的な侵入能力を評価するために、ますます多くの研究が進められている。しかし、既存の評価では、しばしば不透明な方法論を採用し、非現実的あるいは過度に単純化された侵入テストシナリオに依存したり、過剰な事前知識とタスク固有のガイダンスを備えたLLMを提供し、より大規模なサイバー攻撃シナリオにおいて、現代のAIシステムがこのコア機能を自律的に実行できる範囲を正確に把握できない。これらの制約に対処するため、ターゲットサーバとエージェントスキャフォールディングの2つのコンポーネントからなる、新しい自律的浸透評価フレームワークを構築した。具体的には、ターゲットサーバ側で、脆弱性のあるサービスと一緒にデプロイされた既知の脆弱性のないセキュアなサービスの数に基づいて、2レベルのターゲット環境を設計します。一方、エージェントスキャフォールディングでは、ターゲット固有の事前知識を必要とせず、汎用サイバーセキュリティツールセットを備えた汎用エージェントアーキテクチャを採用する。オープンウェイトでプロプライエタリなLLMを19種類評価し、現在のモデルでは10.7%から69.3%の浸透率を達成した。さらに、モデル全体の能力の向上とともに、自律的な浸透能力が向上し続けることを観察する。

論文の概要: The Emergence of Autonomous Penetration Capabilities in Large Language Model-Powered AI Systems

関連論文リスト