Fugu-MT 論文翻訳(概要): Stated Preference for Interaction and Continued Engagement (SPICE): Evaluating an LLM's Willingness to Re-engage in Conversation

論文の概要: Stated Preference for Interaction and Continued Engagement (SPICE): Evaluating an LLM's Willingness to Re-engage in Conversation

arxiv url: http://arxiv.org/abs/2509.09043v1
Date: Wed, 10 Sep 2025 22:34:17 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-12 16:52:24.164871
Title: Stated Preference for Interaction and Continued Engagement (SPICE): Evaluating an LLM's Willingness to Re-engage in Conversation
Title（参考訳）: インタラクションと継続エンゲージメント(SPICE:Stated Preference for Interaction and Continued Engagement: LLM's Willingness to Re-engage in Conversation)の評価
Authors: Thomas Manuel Rost, Martina Figlia, Bernd Wallraff,
Abstract要約: Stated Preference for Interaction and Continued Engagement (SPICE)は、大規模言語モデルにYESまたはNO質問をすることで引き起こされる単純な診断信号である。 10-interactionstimul setによる3-tone(親しみやすい,不明瞭,嫌悪感)を用いた実験では,4つのフレーミング条件で4つのオープンウェイトチャットモデルを検証した。友好的な相互作用は継続をほぼ一様に好んだ(97.5% YES)が、虐待的相互作用は断念を強く好んだ(17.9% YES)
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce and evaluate Stated Preference for Interaction and Continued Engagement (SPICE), a simple diagnostic signal elicited by asking a Large Language Model a YES or NO question about its willingness to re-engage with a user's behavior after reviewing a short transcript. In a study using a 3-tone (friendly, unclear, abusive) by 10-interaction stimulus set, we tested four open-weight chat models across four framing conditions, resulting in 480 trials. Our findings show that SPICE sharply discriminates by user tone. Friendly interactions yielded a near-unanimous preference to continue (97.5% YES), while abusive interactions yielded a strong preference to discontinue (17.9% YES), with unclear interactions falling in between (60.4% YES). This core association remains decisive under multiple dependence-aware statistical tests, including Rao-Scott adjustment and cluster permutation tests. Furthermore, we demonstrate that SPICE provides a distinct signal from abuse classification. In trials where a model failed to identify abuse, it still overwhelmingly stated a preference not to continue the interaction (81% of the time). An exploratory analysis also reveals a significant interaction effect: a preamble describing the study context significantly impacts SPICE under ambiguity, but only when transcripts are presented as a single block of text rather than a multi-turn chat. The results validate SPICE as a robust, low-overhead, and reproducible tool for auditing model dispositions, complementing existing metrics by offering a direct, relational signal of a model's state. All stimuli, code, and analysis scripts are released to support replication.
Abstract（参考訳）: 本稿では,大規模言語モデルにYESやNOを質問し,簡単な診断信号であるSPICE(Stated Preference for Interaction and Continued Engagement)を導入,評価する。 10反応刺激セットによる3トーン(親和性,不明瞭,嫌悪性)を用いた研究で,4つのフレーミング条件で4つのオープンウェイトチャットモデルをテストし,その結果480の試験結果を得た。以上の結果から,SPICEはユーザの声調によって大きく差別されることが明らかとなった。友好的な相互作用は継続するほぼ一様(97.5% YES)を好んだのに対し、虐待的相互作用は断続(17.9% YES)を強く好んだ(60.4% YES)。このコアアソシエーションは、Rao-Scott調整やクラスタ置換テストなど、複数の依存を意識した統計テストにおいて決定的なままである。さらに,SPICEは乱用分類とは異なるシグナルを提供することを示した。モデルが乱用を識別できなかった裁判では、インタラクションを継続しない(時間の81%)という選択が圧倒的に多かった。研究状況を記述する序文は、曖昧さの下でSPICEに顕著に影響を及ぼすが、書き起こしがマルチターンチャットではなく、単一のテキストブロックとして提示される場合に限られる。その結果、SPICEはモデル配置を監査するための堅牢で低オーバーヘッドで再現可能なツールであり、モデル状態の直接的リレーショナル信号を提供することで既存のメトリクスを補完する。すべての刺激、コード、分析スクリプトがリリースされ、レプリケーションをサポートする。

論文の概要: Stated Preference for Interaction and Continued Engagement (SPICE): Evaluating an LLM's Willingness to Re-engage in Conversation

関連論文リスト