Fugu-MT 論文翻訳(概要): Countermind: A Multi-Layered Security Architecture for Large Language Models

論文の概要: Countermind: A Multi-Layered Security Architecture for Large Language Models

arxiv url: http://arxiv.org/abs/2510.11837v1
Date: Mon, 13 Oct 2025 18:41:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-15 19:02:32.064809
Title: Countermind: A Multi-Layered Security Architecture for Large Language Models
Title（参考訳）: Countermind: 大規模言語モデルのための多層セキュリティアーキテクチャ
Authors: Dominik Schwarz,
Abstract要約: 本稿では,多層型セキュリティアーキテクチャであるCountermindを提案する。アーキテクチャは、すべての入力を構造的に検証し変換するように設計された強化された周辺装置と、出力が発生する前にモデルのセマンティック処理経路を制約する内部ガバナンス機構を提案する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The security of Large Language Model (LLM) applications is fundamentally challenged by "form-first" attacks like prompt injection and jailbreaking, where malicious instructions are embedded within user inputs. Conventional defenses, which rely on post hoc output filtering, are often brittle and fail to address the root cause: the model's inability to distinguish trusted instructions from untrusted data. This paper proposes Countermind, a multi-layered security architecture intended to shift defenses from a reactive, post hoc posture to a proactive, pre-inference, and intra-inference enforcement model. The architecture proposes a fortified perimeter designed to structurally validate and transform all inputs, and an internal governance mechanism intended to constrain the model's semantic processing pathways before an output is generated. The primary contributions of this work are conceptual designs for: (1) A Semantic Boundary Logic (SBL) with a mandatory, time-coupled Text Crypter intended to reduce the plaintext prompt injection attack surface, provided all ingestion paths are enforced. (2) A Parameter-Space Restriction (PSR) mechanism, leveraging principles from representation engineering, to dynamically control the LLM's access to internal semantic clusters, with the goal of mitigating semantic drift and dangerous emergent behaviors. (3) A Secure, Self-Regulating Core that uses an OODA loop and a learning security module to adapt its defenses based on an immutable audit log. (4) A Multimodal Input Sandbox and Context-Defense mechanisms to address threats from non-textual data and long-term semantic poisoning. This paper outlines an evaluation plan designed to quantify the proposed architecture's effectiveness in reducing the Attack Success Rate (ASR) for form-first attacks and to measure its potential latency overhead.
Abstract（参考訳）: 大規模言語モデル(LLM)アプリケーションのセキュリティは、ユーザ入力に悪意のある命令が埋め込まれているプロンプトインジェクションやジェイルブレイクのような"フォームファースト"攻撃によって、基本的には問題となる。ポストホック出力フィルタリングに依存する従来の防御は、しばしば脆く、根本原因に対処できない:信頼できないデータと信頼できない命令を区別できない。本稿では,多層型セキュリティアーキテクチャであるCountermindを提案する。アーキテクチャは、すべての入力を構造的に検証し変換するように設計された強化された周辺装置と、出力が発生する前にモデルのセマンティック処理経路を制約する内部ガバナンス機構を提案する。 1) 摂動境界論理(SBL)と必須の時間結合されたテキストクリプタは、すべての摂取経路が強制される場合、平文のプロンプトインジェクション攻撃面を減らすことを目的としている。 2) 表現工学の原理を活かしたパラメータ空間制限(PSR)機構により, LLMの内部意味クラスタへのアクセスを動的に制御し, セマンティックドリフトと危険な創発的行動の緩和を図る。 (3) OODAループと学習セキュリティモジュールを使用して、不変監査ログに基づいて防御を適応するセキュアで自己規制型のコア。 (4)非テクストデータと長期セマンティック中毒からの脅威に対処するためのマルチモーダル入力サンドボックスとコンテキストデフエンス機構。本稿では,フォームファースト攻撃における攻撃成功率(ASR)を低減し,潜在的遅延オーバヘッドを測定する上で,提案するアーキテクチャの有効性を定量的に評価する計画について概説する。

論文の概要: Countermind: A Multi-Layered Security Architecture for Large Language Models

関連論文リスト