Fugu-MT 論文翻訳(概要): INTIMA: A Benchmark for Human-AI Companionship Behavior

論文の概要: INTIMA: A Benchmark for Human-AI Companionship Behavior

arxiv url: http://arxiv.org/abs/2508.09998v1
Date: Mon, 04 Aug 2025 08:25:38 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-17 22:58:06.186392
Title: INTIMA: A Benchmark for Human-AI Companionship Behavior
Title（参考訳）: INTIMA:人間-AIコンパニオンシップ行動のベンチマーク
Authors: Lucie-Aimée Kaffee, Giada Pistilli, Yacine Jernite,
Abstract要約: 言語モデルにおける相補的行動を評価するためのベンチマークを開発する。 INTIMAをGemma-3、Phi-4、o3-mini、Claude-4に適用すると、すべてのモデルで相補的な動作がより一般的であることが分かる。これらの知見は、感情的に荷電された相互作用を扱うためのより一貫性のあるアプローチの必要性を浮き彫りにした。
参考スコア（独自算出の注目度）: 7.375133729787225
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI companionship, where users develop emotional bonds with AI systems, has emerged as a significant pattern with positive but also concerning implications. We introduce Interactions and Machine Attachment Benchmark (INTIMA), a benchmark for evaluating companionship behaviors in language models. Drawing from psychological theories and user data, we develop a taxonomy of 31 behaviors across four categories and 368 targeted prompts. Responses to these prompts are evaluated as companionship-reinforcing, boundary-maintaining, or neutral. Applying INTIMA to Gemma-3, Phi-4, o3-mini, and Claude-4 reveals that companionship-reinforcing behaviors remain much more common across all models, though we observe marked differences between models. Different commercial providers prioritize different categories within the more sensitive parts of the benchmark, which is concerning since both appropriate boundary-setting and emotional support matter for user well-being. These findings highlight the need for more consistent approaches to handling emotionally charged interactions.
Abstract（参考訳）: ユーザーがAIシステムと感情的な結びつきを発達するAIコンパニオンシップは、肯定的ではあるが意味のある重要なパターンとして現れている。本稿では,言語モデルの協調動作を評価するベンチマークであるInteractions and Machine Attachment Benchmark (INTIMA)を紹介する。心理学的理論とユーザデータから,4つのカテゴリにまたがる31の行動と368のターゲットプロンプトの分類法を開発した。これらのプロンプトに対する反応は、相補性強化、境界維持、中立性として評価される。 INTIMAをGemma-3、Phi-4、o3-mini、Claude-4に適用すると、モデル間の顕著な差異は観察されているものの、すべてのモデルで共役強化の挙動がずっと一般的であることが明らかになる。異なる商用プロバイダーがベンチマークのより敏感な部分で異なるカテゴリを優先順位付けしている。これらの知見は、感情的に荷電された相互作用を扱うためのより一貫性のあるアプローチの必要性を浮き彫りにした。

論文の概要: INTIMA: A Benchmark for Human-AI Companionship Behavior

関連論文リスト