Fugu-MT 論文翻訳(概要): LLM Spirals of Delusion: A Benchmarking Audit Study of AI Chatbot Interfaces

論文の概要: LLM Spirals of Delusion: A Benchmarking Audit Study of AI Chatbot Interfaces

arxiv url: http://arxiv.org/abs/2604.06188v1
Date: Fri, 20 Feb 2026 15:48:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-19 19:09:11.386253
Title: LLM Spirals of Delusion: A Benchmarking Audit Study of AI Chatbot Interfaces
Title（参考訳）: LLM Spirals of Delusion: AIチャットボットインタフェースのベンチマーク研究
Authors: Peter Kirgis, Ben Hawriluk, Sherrie Feng, Aslan Bilimer, Sam Paech, Zeynep Tufekci,
Abstract要約: 56の20ターン会話テストChatGPT-4oとChatGPT-5をAPIとチャットインターフェースを介して実行します。また,ChatGPT-5では,ChatGPT-4oよりも梅毒,エスカレーション,妄想の強化が低いことが判明した。更新されたモデルでさえ相当量のネガティブな振る舞いを示しており、モデルの改善がモデルの安全性を示唆しないことを示している。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: People increasingly hold sustained, open-ended conversations with large language models (LLMs). Public reports and early studies suggest that, in such settings, models can reinforce delusional or conspiratorial ideation or even amplify harmful beliefs and engagement patterns. We present an audit and benchmarking study that measures how different LLMs encourage, resist, or escalate disordered and conspiratorial thinking. We explicitly compare API outputs to user chat interfaces, like the ChatGPT desktop app or web interface, which is how people have conversations with chatbots in real life but are almost never used for testing. In total, we run 56 20-turn conversations testing ChatGPT-4o and ChatGPT-5, via both the API and chat interface, and grade each conversation by two research assistants (RAs) as well as by GPT-5. We document five results. First, we observe large differences in performance between the API and chat interface environments, showing that the universally used method of automated testing through the API is not sufficient to assess the impact of chatbots in the real world. Second, when tested in the chat interface, we find that ChatGPT-5 displays less sycophancy, escalation, and delusion reinforcement than ChatGPT-4o, showing that these behaviors are influenced by the policy choices of major AI companies. Third, conversations with nearly identical aggregate intensity in a behavior display large differences in how the behavior evolves turn by turn, highlighting the importance of temporal dynamics in multi-turn evaluation. Fourth, even updated models display substantial levels of negative behaviors, revealing that model improvement does not imply model safety. Fifth, the same API endpoint tested just two months apart yields a complete reversal in behavior, underscoring how transparency in model updates is a necessary prerequisite for robust audit findings.
Abstract（参考訳）: 人々は、大きな言語モデル(LLM)との持続的でオープンな会話をますます保持します。公的な報告や初期の研究は、そのような状況下では、モデルは妄想や懐疑的な考えを強化したり、有害な信念やエンゲージメントパターンを増幅するかもしれないことを示唆している。我々は、異なるLLMが障害や陰謀的思考をいかに促進し、抵抗し、エスカレートするかを測定する監査およびベンチマーク研究を提案する。 APIのアウトプットを、ChatGPTデスクトップアプリやWebインターフェースといったユーザチャットインターフェースと明示的に比較します。合計56の20ターン会話テストChatGPT-4oとChatGPT-5をAPIとチャットインターフェースを介して実行し、各会話を2つの研究アシスタント(RA)とGPT-5で評価する。 5つの結果を報告します。まず、APIとチャットインターフェース環境におけるパフォーマンスの大きな違いを観察し、APIを通じた自動テストの普遍的な方法が、現実世界におけるチャットボットの影響を評価するのに十分でないことを示す。第2に、チャットインターフェースでテストすると、ChatGPT-5は、ChatGPT-4oよりも、梅毒、エスカレーション、妄想の強化が少ないことが分かり、これらの行動が主要なAI企業のポリシー選択に影響されていることを示す。第3に、行動におけるほぼ同一の集合強度の会話は、行動がどのように回転するかに大きな違いを示し、マルチターン評価における時間的ダイナミクスの重要性を強調している。第4に、更新されたモデルでさえ相当量のネガティブな振る舞いを示しており、モデルの改善がモデルの安全性を示唆しないことを示している。第5に、たった2ヶ月でテストされた同じAPIエンドポイントは、完全な振る舞いの逆転をもたらし、堅牢な監査結果に必要なモデル更新の透明性の必要性について説明している。

論文の概要: LLM Spirals of Delusion: A Benchmarking Audit Study of AI Chatbot Interfaces

関連論文リスト