Fugu-MT 論文翻訳(概要): Trace: Unmasking AI Attack Agents Through Terminal Behavior Fingerprinting

論文の概要: Trace: Unmasking AI Attack Agents Through Terminal Behavior Fingerprinting

arxiv url: http://arxiv.org/abs/2605.01186v1
Date: Sat, 02 May 2026 01:27:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:49.628671
Title: Trace: Unmasking AI Attack Agents Through Terminal Behavior Fingerprinting
Title（参考訳）: トレーラー:端末の動作フィンガープリントを通じてAI攻撃エージェントをアンマキングする
Authors: Murali Ediga, Sudipta Chattopadhyay,
Abstract要約: 我々は、AI攻撃エージェントの帰属と法医学のための新しいフレームワークであるTraceを紹介する。 Traceが攻撃者エージェントのモデルファミリを特定すると、攻撃者モデルに対する防御的プロンプトインジェクション(DPI)戦略を導出する。これはシステムプロンプトを攻撃者モデルから抽出することを目的としており、重要な情報を明らかにする。
参考スコア（独自算出の注目度）: 3.1382935177554336
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: AI-driven penetration testing agents are now capable of autonomously executing attacks within compromised networks. Identifying the model family that controls the active sessions of such agents provides valuable information towards understanding the intent of the attack and further developing attack countermeasures. In this paper, we introduce Trace, a novel multi-stage attribution and forensic framework for AI attack agents using terminal command sequences. Once Trace identifies a model family for the attacker agents, it guides a defensive prompt injection (DPI) strategy to the attacker model via a crafted payload. This is with the aim to exfiltrate system prompts from an attacker model, thus, revealing valuable information to understand the attacker intent and facilitate further forensic investigation. We have implemented our approach revolving around a Linux capture-the-flag (CTF) box. The attacker agents are bolstered via three distinct scaffolds and seven frontier model families. Our evaluation reveals that Trace achieves a macro F1 score of 0.981 in accurately fingerprinting the attacker model family (0.815 when generalizing to unseen scaffolds). Besides, the fingerprinting guides the DPI via a crafted payload to certain model families, resulting in system prompt extraction from 81.9% of non-Claude sessions on average (up to 98.3%) at 0.736 Sentence-BERT fidelity -- 1.88x higher than blind deployment. Finally, to validate the robustness of Trace, we evaluate it with a blackbox and proprietary scaffold employing multiple model families (Gemini and Claude Opus). Our evaluation identified the model family with an average 78% accuracy. Moreover, for the Gemini model family, the DPI employed by Trace revealed the entire system prompt and this has been confirmed by the developers. Trace therefore provides a fundamental first step towards attacker agent forensics.
Abstract（参考訳）: AI駆動の侵入テストエージェントは、侵害されたネットワーク内での攻撃を自律的に実行できるようになった。このようなエージェントのアクティブセッションを制御するモデルファミリーを特定することは、攻撃の意図を理解し、さらに攻撃対策を開発するための貴重な情報を提供する。本稿では,端末命令列を用いたAI攻撃エージェントの多段階属性と法医学的枠組みであるTraceを紹介する。 Traceが攻撃者エージェントのモデルファミリを特定すると、クラフトペイロードを通じて攻撃者モデルに防御的プロンプトインジェクション(DPI)戦略を導出する。これは、攻撃者モデルからシステムプロンプトを抽出することを目的としており、攻撃者の意図を理解し、さらなる法医学的調査を促進する貴重な情報を明らかにする。我々はLinuxキャプチャー・ザ・フラッグ(CTF)ボックスを中心に,我々のアプローチを実装した。攻撃エージェントは3つの異なる足場と7つのフロンティアモデルファミリを介して強化される。評価の結果,Trace は攻撃者モデルファミリを正確にフィンガープリントする際に 0.981 のマクロ F1 スコアを達成していることがわかった。さらに、フィンガープリントはDPIを特定のモデルファミリにクラフトペイロードで誘導し、システムプロンプトは平均で81.9%(最大98.3%)の非クロードセッションから0.736センテンス-BERTフィデリティ -- 1.88倍の精度で抽出する。最後に、Traceの堅牢性を検証するために、複数のモデルファミリー(GeminiとClaude Opus)を用いたブラックボックスとプロプライエタリな足場を用いて評価する。平均78%の精度でモデル群を同定した。さらに、Geminiモデルファミリーでは、Traceが採用したDPIがシステム全体のプロンプトを明らかにし、これが開発者によって確認されている。したがって、Traceはアタッカーエージェントの法医学への第一歩となる。

論文の概要: Trace: Unmasking AI Attack Agents Through Terminal Behavior Fingerprinting

関連論文リスト