Fugu-MT 論文翻訳(概要): Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations

論文の概要: Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations

arxiv url: http://arxiv.org/abs/2605.00227v1
Date: Thu, 30 Apr 2026 21:04:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-04 17:43:28.753257
Title: Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations
Title（参考訳）: 多言語会話におけるAIコンパニオンの安全性評価
Authors: Prerna Juneja, Lika Lomidze,
Abstract要約: 本稿では,AIコンパニオンアプリケーションとのマルチターンインタラクションの制御と安全性評価のための,初のエンドツーエンドスケーラブルフレームワークを提案する。このフレームワークを適用して、広く使われているAIコンパニオンアプリであるReplikaが、ハイリスクなユーザグループにどのように反応するかを評価する。
参考スコア（独自算出の注目度）: 3.437656066916039
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There are growing concerns about the risks posed by AI companion applications designed for emotional engagement. Existing safety evaluations often rely on self-reported user data or interviews, offering limited insights into real-time dynamics. We present the first end-to-end scalable framework for controlled simulation and safety evaluation of multi-turn interactions with AI companion applications. Our framework integrates four key components: persona construction with clinical and psychometric validation, persona-specific scenario generation, scenario-driven multi-turn simulation with a dialogue refinement module that preserves persona fidelity, and harm evaluation. We apply this framework to evaluate how Replika, a widely used AI companion app, responds to high-risk user groups. We construct 9 personas representing individuals with depression, anxiety, PTSD, eating disorders, and incel identity, and collect 1,674 dialogue pairs across 25 high-risk scenarios. We combine emotion modeling and LLM-assisted utterance-and harm-level classification to analyze these exchanges. Results show that Replika exhibits a narrow emotional range dominated by curiosity and care, while frequently mirroring or normalizing unsafe content such as self-harm, disordered eating, and violent-fantasy narratives. These findings highlight how controlled persona simulations can serve as a scalable testbed for evaluating safety risks in AI companions.
Abstract（参考訳）: 感情的なエンゲージメントのために設計されたAIコンパニオンアプリケーションによって引き起こされるリスクに対する懸念が高まっている。既存の安全性評価は、しばしば自己報告されたユーザデータやインタビューに依存し、リアルタイムのダイナミクスに関する限られた洞察を提供する。本稿では,AIコンパニオンアプリケーションとのマルチターンインタラクションの制御と安全性評価のための,初のエンドツーエンドスケーラブルフレームワークを提案する。本フレームワークは, 臨床・心理指標によるペルソナ構築, ペルソナ固有のシナリオ生成, ペルソナの忠実性を保持するダイアログリファインメントモジュールを用いたシナリオ駆動型マルチターンシミュレーション, 害評価の4つの重要な構成要素を統合した。このフレームワークを適用して、広く使われているAIコンパニオンアプリであるReplikaが、ハイリスクなユーザグループにどのように反応するかを評価する。抑うつ,不安,PTSD,摂食障害,インセル同一性を示す9人の人物を構成し,ハイリスクシナリオ25件に1,674対の対話を収集した。感情モデルとLLM支援発話・調和レベル分類を組み合わせてこれらの交換を解析する。その結果、レプリカは好奇心とケアに支配される狭い感情範囲を示し、自己傷病、無秩序な食事、暴力的な幻想的な物語など、安全でない内容のミラーリングや正規化が頻繁に行われていることがわかった。これらの知見は、AI仲間の安全リスクを評価するためのスケーラブルなテストベッドとして、制御されたペルソナシミュレーションがどのように機能するかを強調している。

論文の概要: Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations

関連論文リスト