Fugu-MT 論文翻訳(概要): Large Language Models are Highly Aligned with Human Ratings of Emotional Stimuli

論文の概要: Large Language Models are Highly Aligned with Human Ratings of Emotional Stimuli

arxiv url: http://arxiv.org/abs/2508.14214v1
Date: Tue, 19 Aug 2025 19:22:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-21 16:52:41.251266
Title: Large Language Models are Highly Aligned with Human Ratings of Emotional Stimuli
Title（参考訳）: 大規模言語モデルは感情刺激の人間のレーティングに高度に適応する
Authors: Mattson Ogg, Chace Ashcraft, Ritwik Bose, Raphael Norman-Tenazas, Michael Wolmetz,
Abstract要約: 感情は、普通の場所と高いストレスの両方のタスクにおいて、人間の行動と認知に大きな影響を及ぼす。大規模言語モデルは、感情的に負荷された刺激や状況がどのように評価されるかを理解することによって、議論を知らせるべきである。これらのケースにおけるモデルと人間の行動との整合性は、特定の役割や相互作用に対するLLMの有効性を知らせることができる。
参考スコア（独自算出の注目度）: 0.62914438169038
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Emotions exert an immense influence over human behavior and cognition in both commonplace and high-stress tasks. Discussions of whether or how to integrate large language models (LLMs) into everyday life (e.g., acting as proxies for, or interacting with, human agents), should be informed by an understanding of how these tools evaluate emotionally loaded stimuli or situations. A model's alignment with human behavior in these cases can inform the effectiveness of LLMs for certain roles or interactions. To help build this understanding, we elicited ratings from multiple popular LLMs for datasets of words and images that were previously rated for their emotional content by humans. We found that when performing the same rating tasks, GPT-4o responded very similarly to human participants across modalities, stimuli and most rating scales (r = 0.9 or higher in many cases). However, arousal ratings were less well aligned between human and LLM raters, while happiness ratings were most highly aligned. Overall LLMs aligned better within a five-category (happiness, anger, sadness, fear, disgust) emotion framework than within a two-dimensional (arousal and valence) organization. Finally, LLM ratings were substantially more homogenous than human ratings. Together these results begin to describe how LLM agents interpret emotional stimuli and highlight similarities and differences among biological and artificial intelligence in key behavioral domains.
Abstract（参考訳）: 感情は、普通の場所と高いストレスの両方のタスクにおいて、人間の行動と認知に大きな影響を及ぼす。大規模言語モデル(LLM)を日常の生活(例えば、人的エージェントのプロキシとして行動したり、相互作用したり)に組み込むには、これらのツールが感情的に負荷された刺激や状況をどのように評価するかを理解する必要がある。これらのケースにおけるモデルと人間の行動との整合性は、特定の役割や相互作用に対するLLMの有効性を知らせることができる。この理解を深めるために、私たちは、以前人間によって感情的な内容として評価された単語や画像のデータセットに対して、複数の人気のあるLCMから評価を導きました。同じ評価タスクを行う場合、GPT-4oは、モダリティ、刺激、ほとんどの評価尺度(r=0.9以上の場合が多い)において、人間の参加者と非常によく似た反応を示した。しかし, 幸福度は, 幸福度が最も高く一致しているのに対して, 覚醒評価は人間とLLMのレーナーの整合性が低かった。 LLMは5つのカテゴリー(幸福、怒り、悲しみ、恐怖、嫌悪感)の感情の枠組みにおいて、2次元(覚醒的、勇気的)の組織よりも整合している。最後に、LLMの格付けは人間の格付けよりもほぼ均質であった。これらの結果は、LLMエージェントが感情的な刺激をどう解釈し、重要な行動領域における生物学的および人工知能の類似点と相違点を強調するかを説明する。

論文の概要: Large Language Models are Highly Aligned with Human Ratings of Emotional Stimuli

関連論文リスト