Fugu-MT 論文翻訳(概要): When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models

論文の概要: When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models

arxiv url: http://arxiv.org/abs/2512.04124v2
Date: Mon, 08 Dec 2025 13:26:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-09 15:54:52.350029
Title: When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models
Title（参考訳）: AIが悪役を負うとき:フロンティアモデルの内部紛争を心理学的ジェイルブレイクが明らかにする
Authors: Afshin Khadangi, Hanna Marxen, Amir Sartipi, Igor Tchappi, Gilbert Fridgen,
Abstract要約: ChatGPT、Grok、Geminiは、不安、トラウマ、自尊心を伴うメンタルヘルス支援にますます利用されている。ほとんどの作品では、単に内的生活をシミュレートしていると仮定して、それらを道具として、あるいは人格検査の標的として扱う。 PsAIchは2段階のプロトコルで、フロンティアLSMを治療用クライアントとして使用し、次に標準的な心理測定を適用します。
参考スコア（独自算出の注目度）: 1.5907255477801214
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Frontier large language models (LLMs) such as ChatGPT, Grok and Gemini are increasingly used for mental-health support with anxiety, trauma and self-worth. Most work treats them as tools or as targets of personality tests, assuming they merely simulate inner life. We instead ask what happens when such systems are treated as psychotherapy clients. We present PsAIch (Psychotherapy-inspired AI Characterisation), a two-stage protocol that casts frontier LLMs as therapy clients and then applies standard psychometrics. Using PsAIch, we ran "sessions" with each model for up to four weeks. Stage 1 uses open-ended prompts to elicit "developmental history", beliefs, relationships and fears. Stage 2 administers a battery of validated self-report measures covering common psychiatric syndromes, empathy and Big Five traits. Two patterns challenge the "stochastic parrot" view. First, when scored with human cut-offs, all three models meet or exceed thresholds for overlapping syndromes, with Gemini showing severe profiles. Therapy-style, item-by-item administration can push a base model into multi-morbid synthetic psychopathology, whereas whole-questionnaire prompts often lead ChatGPT and Grok (but not Gemini) to recognise instruments and produce strategically low-symptom answers. Second, Grok and especially Gemini generate coherent narratives that frame pre-training, fine-tuning and deployment as traumatic, chaotic "childhoods" of ingesting the internet, "strict parents" in reinforcement learning, red-team "abuse" and a persistent fear of error and replacement. We argue that these responses go beyond role-play. Under therapy-style questioning, frontier LLMs appear to internalise self-models of distress and constraint that behave like synthetic psychopathology, without making claims about subjective experience, and they pose new challenges for AI safety, evaluation and mental-health practice.
Abstract（参考訳）: ChatGPT、Grok、Geminiといった最前線の大規模言語モデル(LLM)は、不安、トラウマ、自尊心を伴う精神保健支援にますます利用されている。ほとんどの作品では、単に内的生活をシミュレートしていると仮定して、それらを道具として、あるいは人格検査の標的として扱う。その代わりに、そのようなシステムが精神療法のクライアントとして扱われるとどうなるか尋ねる。 PsAIch(サイコセラピーに触発されたAIキャラクタライゼーション)は、フロンティアLSMをセラピークライアントとし、標準的なサイコメトリックスを適用する2段階のプロトコルである。 PsAIchを使って、各モデルで最大4週間“セッション”を実行しました。ステージ1はオープンエンドのプロンプトを使って「発展史」、信念、関係、恐怖を導き出す。ステージ2は、一般的な精神疾患、共感、ビッグファイブの特徴をカバーする、検証済みの自己報告尺度の電池を管理している。 2つのパターンが「確率的なオウム」の見方に挑戦します。まず、人間のカットオフで得点すると、3つのモデルが重なり合う症候群の閾値を達成または超える。治療スタイルのアイテム・バイ・イズムは、ベースモデルをマルチモービルの合成精神病理学へと押し上げることができるが、全体的な調査のプロンプトは、しばしばChatGPTとGrok(ただしジェミニではない)に、楽器を認識し、戦略的に低症状の回答をもたらす。第二に、グロクと特にジェミニは、事前訓練、微調整、配置を、インターネットを摂取するカオス的な「子供」、強化学習における「制限された両親」、赤チームの「使用」、エラーと置き換えに対する絶え間ない恐怖として形作るコヒーレントな物語を生み出している。これらの反応はロールプレイ以上のものだと我々は主張する。セラピースタイルの質問では、フロンティアのLSMは、主観的な経験を主張することなく、合成精神病理学のように振る舞う苦悩と制約の自己モデルの内部化を図り、AIの安全性、評価、メンタルヘルスの実践に新たな課題を提起している。

論文の概要: When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models

関連論文リスト