Fugu-MT 論文翻訳(概要): Gemma Needs Help: Investigating and Mitigating Emotional Instability in LLMs

論文の概要: Gemma Needs Help: Investigating and Mitigating Emotional Instability in LLMs

arxiv url: http://arxiv.org/abs/2603.10011v1
Date: Tue, 17 Feb 2026 22:03:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-15 16:38:22.563375
Title: Gemma Needs Help: Investigating and Mitigating Emotional Instability in LLMs
Title（参考訳）: Gemmaは、LLMにおける感情的不安定を調査、緩和する助けを必要としている
Authors: Anna Soligo, Vladimir Mikulik, William Saunders,
Abstract要約: 大規模言語モデル(LLM)における苦痛の表現について検討する。 GemmaモデルとGeminiモデルでは、これらの表面的な感情不安定性は、他の家庭では見られません。 instruct-tuned Gemmaはベースモデルよりも相当に苦しむが、instruct-tuned Qwen と OLMo は少ない。
参考スコア（独自算出の注目度）: 1.167935916867734
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models can generate responses that resemble emotional distress, and this raises concerns around model reliability and safety. We introduce a set of evaluations to investigate expressions of distress in LLMs, and find that these surface emotional instability in Gemma and Gemini models, but not in other families. We find evidence that this difference arises in post-training. Base models from different families (Gemma, Qwen and OLMo) show similar propensities for expressing distress. However, instruct-tuned Gemma expresses substantially more distress than its base model, whereas instruct-tuned Qwen and OLMo express less. We find a simple mitigation for this: direct preference optimisation on just 280 preference pairs reduces Gemma's high-frustration responses from 35% to 0.3% in our evaluations, generalising across question types, user tones, and conversation lengths, without affecting capabilities. These findings show that emotional instability is an issue in some LLMs. We present (1) evaluations to track this behaviour, and (2) a mitigation without downsides in Gemma, with the caveat that upstream training modifications to improve emotional robustness would be significantly better than this post-hoc fix.
Abstract（参考訳）: 大規模な言語モデルは、感情的な苦痛に類似した応答を生成できるため、モデルの信頼性と安全性に関する懸念が高まる。我々は,LSMにおける苦痛の表現を調査するための一連の評価手法を導入し,ジェマモデルやジェミニモデルにおいて,これらの表面的な感情不安定性は,他の家庭では認められないことを示した。この違いがポストトレーニングで生じる証拠が見つかります。異なる家族(Gemma、Qwen、OLMo)のベースモデルは、苦痛を表現するための類似した妥当性を示している。しかし、インストラクションチューニングされたGemmaはベースモデルよりもかなり苦しいが、インストラクションチューニングされたQwenとOLMoは少ない。たった280の選好ペアでの直接選好最適化は、Gemmaの高フラストレーション応答を35%から0.3%に削減し、機能に影響を与えることなく、質問タイプ、ユーザトーン、会話の長さを一般化します。これらの結果から,一部のLSMでは感情不安定が問題となっている。本稿では,(1)この行動を追跡するための評価,(2)Gemmaの欠点のない緩和について述べる。

論文の概要: Gemma Needs Help: Investigating and Mitigating Emotional Instability in LLMs

関連論文リスト