Fugu-MT 論文翻訳(概要): Safeguarding LLMs Against Misuse and AI-Driven Malware Using Steganographic Canaries

論文の概要: Safeguarding LLMs Against Misuse and AI-Driven Malware Using Steganographic Canaries

arxiv url: http://arxiv.org/abs/2603.28655v1
Date: Mon, 30 Mar 2026 16:40:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:45.514999
Title: Safeguarding LLMs Against Misuse and AI-Driven Malware Using Steganographic Canaries
Title（参考訳）: ステガノグラフィーカナリを用いた誤用とAI駆動型マルウェアに対するLLMの保護
Authors: Md Raz, Venkata Sai Charan Putrevu, Meet Udeshi, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri,
Abstract要約: AIを利用したマルウェアは、クラウドでホストされた生成AIサービスや大規模言語モデルを活用する傾向にある。両方の脅威はAIサービスの取り込みバウンダリに収束するが、既存の防御はエンドポイントとネットワーク周辺に重点を置いている。本稿では, 統計カナリアファイルに基づくフレームワークを提案する。
参考スコア（独自算出の注目度）: 16.391742476325323
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: AI-powered malware increasingly exploits cloud-hosted generative-AI services and large language models (LLMs) as analysis engines for reconnaissance and code generation. Simultaneously, enterprise uploads expose sensitive documents to third-party AI vendors. Both threats converge at the AI service ingestion boundary, yet existing defenses focus on endpoints and network perimeters, leaving organizations with limited visibility once plaintext reaches an LLM service. To address this, we present a framework based on steganographic canary files: realistic documents carrying cryptographically derived identifiers embedded via complementary encoding channels. A pre-ingestion filter extracts and verifies these identifiers before LLM processing, enabling passive, format-agnostic detection without semantic classification. We support two modes of operation where Mode A marks existing sensitive documents with layered symbolic encodings (whitespace substitution, zero-width character insertion, homoglyph substitution), while Mode B generates synthetic canary documents using linguistic steganography (arithmetic coding over GPT-2), augmented with compatible symbolic layers. We model increasing document pre-processing and adversarial capability for both modes via a four-tier transport-transform taxonomy: All methods achieve 100% identifier recovery under benign and sanitization workflows (Tiers 1-2). The hybrid Mode B maintains 97% through targeted adversarial transforms (Tier 3). An end-to-end case study against an LLM-orchestrated ransomware pipeline confirms that both modes detect and block canary-bearing uploads before file encryption begins. To our knowledge, this is the first framework to systematically combine symbolic and linguistic text steganography into layered canary documents for detecting unauthorized LLM processing, evaluated against a transport-threat taxonomy tailored to AI malware.
Abstract（参考訳）: AIを利用したマルウェアは、クラウドでホストされる生成AIサービスと大規模言語モデル(LLM)を、偵察とコード生成のための分析エンジンとして利用している。同時に、企業は機密文書をサードパーティのAIベンダーに公開する。どちらの脅威もAIサービスの取り込みバウンダリに収束するが、既存の防御はエンドポイントとネットワーク周辺に重点を置いており、平文がLLMサービスに到達すると、組織の可視性が制限される。そこで本稿では,解析用カナリアファイルに基づくフレームワークを提案する。プレインジェクションフィルタは、LLM処理の前にこれらの識別子を抽出し、検証し、意味分類なしで受動的で形式に依存しない検出を可能にする。提案手法は,モードAが既存のセンシティブな文書に階層化された記号エンコーディング(ホワイトスペース置換,ゼロ幅文字挿入,ホモグリフ置換)を印字し,モードBが言語的ステガノグラフィー(GPT-2上でのパラメータ符号化)を用いて合成カナリア文書を生成する方式である。我々は、4階層のトランスポート・トランスフォーメーション分類を用いて、両方のモードに対する文書前処理と敵対能力の増大をモデル化する: 良性および衛生的ワークフローの下で、すべてのメソッドが100%の識別子回復を達成する(Tiers 1-2)。ハイブリッドモードBは、ターゲット対向変換により97%を維持している(Tier 3)。 LLM準拠のランサムウェアパイプラインに対するエンドツーエンドのケーススタディでは、両方のモードがファイル暗号化が始まる前にカナリア対応のアップロードを検出し、ブロックすることを確認している。我々の知る限り、これは、AIマルウェアに適した輸送脅威分類法に対して評価された、無許可のLDM処理を検出するための階層化されたカナリア文書に、記号的および言語的テキストステガノグラフィーを体系的に組み合わせた最初のフレームワークである。

論文の概要: Safeguarding LLMs Against Misuse and AI-Driven Malware Using Steganographic Canaries

関連論文リスト