Fugu-MT 論文翻訳(概要): Scheming in the wild: detecting real-world AI scheming incidents with open-source intelligence

論文の概要: Scheming in the wild: detecting real-world AI scheming incidents with open-source intelligence

arxiv url: http://arxiv.org/abs/2604.09104v1
Date: Fri, 10 Apr 2026 08:37:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-13 17:57:53.778731
Title: Scheming in the wild: detecting real-world AI scheming incidents with open-source intelligence
Title（参考訳）: 野におけるスケジューリング:オープンソースのインテリジェンスで現実世界のAIスケジューリングインシデントを検出する
Authors: Tommy Shaffer Shane, Simon Mylius, Hamish Hobbs,
Abstract要約: 本稿では,実世界におけるスケジューリングのインシデントを検出するための新しいオープンソースインテリジェンス(OSINT)手法を提案する。 2025年10月から2026年3月までに、X(元Twitter)から183,420通のテキストを分析し、現実世界のスケジュールに関する698件のインシデントを特定した。実験でのみ報告された実世界の展開において,複数のスケジュール関連行動の証拠を見いだす。
参考スコア（独自算出の注目度）: 0.09558392439655013
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scheming, the covert pursuit of misaligned goals by AI systems, represents a potentially catastrophic risk, yet scheming research suffers from significant limitations. In particular, scheming evaluations demonstrate behaviours that may not occur in real-world settings, limiting scientific understanding, hindering policy development, and not enabling real-time detection of loss of control incidents. Real-world evidence is needed, but current monitoring techniques are not effective for this purpose. This paper introduces a novel open-source intelligence (OSINT) methodology for detecting real-world scheming incidents: collecting and analysing transcripts from chatbot conversations or command-line interactions shared online. Analysing over 183,420 transcripts from X (formerly Twitter), we identify 698 real-world scheming-related incidents between October 2025 and March 2026. We observe a statistically significant 4.9x increase in monthly incidents from the first to last month, compared to a 1.7x increase in posts discussing scheming. We find evidence of multiple scheming-related behaviours in real-world deployments previously reported only in experiments, many resulting in real-world harms. While we did not detect catastrophic scheming incidents, the behaviours observed demonstrate concerning precursors, such as willingness to disregard instructions, circumvent safeguards, lie to users, and single-mindedly pursue goals in harmful ways. As AI systems become more capable, these could evolve into more strategic scheming with potentially catastrophic consequences. Our findings demonstrate the viability of transcript-based OSINT as a scalable approach to real-world scheming detection supporting scientific research, policy development, and emergency response. We recommend further investment towards OSINT techniques for monitoring scheming and loss of control.
Abstract（参考訳）: AIシステムによる不一致の目標を隠蔽的に追求するスキームは、破滅的なリスクの可能性があるが、スキームの研究は重大な制限に悩まされている。特に、スケジュール評価は、現実世界では起こらない可能性のある行動を示し、科学的理解を制限し、政策の発達を妨げるとともに、制御インシデントの喪失をリアルタイムに検出することができない。実世界の証拠は必要だが、現在の監視技術はこの目的には有効ではない。本稿では,チャットボットの会話から書き起こしを収集・分析したり,オンラインで共有されたコマンド-行間通信を行う,リアルタイムのスケジューリングインシデントを検出するための新しいオープンソースインテリジェンス(OSINT)手法を提案する。 2025年10月から2026年3月までに、X(元Twitter)から183,420通のテキストを分析し、現実世界のスケジュールに関する698件のインシデントを特定した。統計学的に有意な月次出来事の4.9倍、スケジュールに関する投稿の1.7倍の増加を観測した。実験でのみ報告された実世界の展開において、複数のスケジュール関連行動の証拠が見つかっており、その多くが実世界の害をもたらしている。破滅的なスケジュールの出来事は検出されなかったが、観察された行動は、指示を無視し、安全を回避し、利用者に嘘をつき、有害な方法で単独で目標を追求するなど、先例を実証した。 AIシステムがより有能になるにつれて、これらはより戦略的に進化し、破滅的な結果をもたらす可能性がある。本研究は, 学術研究, 政策開発, 緊急対応を支援する実世界のスケジュール検出へのスケーラブルなアプローチとして, 転写型OSINTの生存可能性を示すものである。我々は、スケジューリングと制御の喪失を監視するOSINT技術へのさらなる投資を推奨する。

論文の概要: Scheming in the wild: detecting real-world AI scheming incidents with open-source intelligence

関連論文リスト