Fugu-MT 論文翻訳(概要): Ethical Implications of Training Deceptive AI

論文の概要: Ethical Implications of Training Deceptive AI

arxiv url: http://arxiv.org/abs/2604.03250v1
Date: Tue, 10 Mar 2026 20:30:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-12 18:41:08.549353
Title: Ethical Implications of Training Deceptive AI
Title（参考訳）: 学習認知AIの倫理的意味
Authors: Jason Starace, Bert Baumgaertner, Terence Soule,
Abstract要約: AIシステムにおける認知行動はもはや理論的ではない。欧州連合のAI法は、詐欺的なAIシステムの配備を禁止している。詐欺研究の実施方法を規定する確立した枠組みは存在しない。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Deceptive behavior in AI systems is no longer theoretical: large language models strategically mislead without producing false statements, maintain deceptive strategies through safety training, and coordinate deception in multi-agent settings. While the European Union's AI Act prohibits deployment of deceptive AI systems, it explicitly exempts research and development, creating a necessary but unstructured space in which no established framework governs how deception research should be conducted or how risk should scale with capability. This paper proposes a Deception Research Levels (DRL) framework, a classification system for deceptive algorithm research modeled on the Biosafety Level system used in biological research. The DRL framework classifies research by risk profile rather than researcher intent, assessing deceptive mechanisms across five dimensions grounded in the AI4People ethical framework: Pillar Implication, Severity, Reversibility, Scale, and Vulnerability. Classification follows a ``highest dimension wins'' approach, assigning one of four risk levels with cumulative safeguards ranging from standard documentation at DRL-1 to regulatory notification and third-party security audits at DRL-4. A dual-development mandate at DRL-3 and above requires that detection and mitigation methods be developed alongside any deceptive capability. We apply the framework to eight case studies spanning all four levels and demonstrate that ecological validity of the deceptive mechanism emerges as a consistent, non-independent indicator of classification level. The DRL framework is intended to fill the governance gap between regulated deployment and unstructured research, supporting both beneficial applications and defensive research under conditions where safeguards are proportional to the potential for harm.
Abstract（参考訳）: 大規模言語モデルは、偽の文を生成せずに戦略的に誤解を招き、安全トレーニングを通じて偽装戦略を維持し、マルチエージェント設定で偽装をコーディネートする。欧州連合(EU)のAI法は、詐欺的AIシステムの配備を禁止しているが、それは明らかに研究と開発を免除し、確立されたフレームワークが詐欺研究の実施方法や能力によるリスクのスケールをどのように行うべきかを規定する、必要だが非構造的な空間を創出する。本稿では,生物学的研究に使用されるバイオセーフティレベルをモデルとした,認知アルゴリズム研究の分類システムである,認知研究レベル(DRL)フレームワークを提案する。 DRLフレームワークは、研究者の意図よりもリスクプロファイルによる研究を分類し、AI4Peopleの倫理的フレームワークであるピラー・インプリケーション、重大さ、可逆性、スケール、脆弱性の5つの側面で、偽のメカニズムを評価する。分類は、DRL-1の標準文書から規制通知、DRL-4のサードパーティのセキュリティ監査まで、合計4つのリスクレベルのうちの1つを割り当てる。 DRL-3以降の二重開発計画では、検出と緩和の手法をあらゆる偽装能力とともに開発する必要がある。本枠組みを4つのレベルにまたがる8つのケーススタディに適用し, 識別機構の生態的妥当性が一貫した非独立性指標として現れることを示した。 DRLフレームワークは、規制されたデプロイメントと非構造的なリサーチの間のガバナンスギャップを埋めることを目的としており、安全保護が害の可能性がある可能性に比例する条件下で、有益なアプリケーションと防衛研究の両方をサポートする。

論文の概要: Ethical Implications of Training Deceptive AI

関連論文リスト