Large Language Models are Highly Aligned with Human Ratings of Emotional Stimuli
- URL: http://arxiv.org/abs/2508.14214v1
- Date: Tue, 19 Aug 2025 19:22:00 GMT
- Title: Large Language Models are Highly Aligned with Human Ratings of Emotional Stimuli
- Authors: Mattson Ogg, Chace Ashcraft, Ritwik Bose, Raphael Norman-Tenazas, Michael Wolmetz,
- Abstract summary: Emotions exert an immense influence over human behavior and cognition in both commonplace and high-stress tasks.<n>Discussions should be informed by an understanding of how large language models evaluate emotionally loaded stimuli or situations.<n>A model's alignment with human behavior in these cases can inform the effectiveness of LLMs for certain roles or interactions.
- Score: 0.62914438169038
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emotions exert an immense influence over human behavior and cognition in both commonplace and high-stress tasks. Discussions of whether or how to integrate large language models (LLMs) into everyday life (e.g., acting as proxies for, or interacting with, human agents), should be informed by an understanding of how these tools evaluate emotionally loaded stimuli or situations. A model's alignment with human behavior in these cases can inform the effectiveness of LLMs for certain roles or interactions. To help build this understanding, we elicited ratings from multiple popular LLMs for datasets of words and images that were previously rated for their emotional content by humans. We found that when performing the same rating tasks, GPT-4o responded very similarly to human participants across modalities, stimuli and most rating scales (r = 0.9 or higher in many cases). However, arousal ratings were less well aligned between human and LLM raters, while happiness ratings were most highly aligned. Overall LLMs aligned better within a five-category (happiness, anger, sadness, fear, disgust) emotion framework than within a two-dimensional (arousal and valence) organization. Finally, LLM ratings were substantially more homogenous than human ratings. Together these results begin to describe how LLM agents interpret emotional stimuli and highlight similarities and differences among biological and artificial intelligence in key behavioral domains.
Related papers
- Fluent but Unfeeling: The Emotional Blind Spots of Language Models [1.248728117157669]
A critical gap remains in evaluating whether Large Language Models (LLMs) align with human emotions at a fine-grained level.<n>We introduce Express, a benchmark dataset curated from Reddit communities featuring 251 fine-grained, self-disclosed emotion labels.<n>Our comprehensive evaluation framework examines predicted emotion terms and decomposes them into eight basic emotions using established emotion theories.
arXiv Detail & Related papers (2025-09-11T16:31:13Z) - SocialEval: Evaluating Social Intelligence of Large Language Models [70.90981021629021]
Social Intelligence (SI) equips humans with interpersonal abilities to behave wisely in navigating social interactions to achieve social goals.<n>This presents an operational evaluation paradigm: outcome-oriented goal achievement evaluation and process-oriented interpersonal ability evaluation.<n>We propose SocialEval, a script-based bilingual SI benchmark, integrating outcome- and process-oriented evaluation by manually crafting narrative scripts.
arXiv Detail & Related papers (2025-06-01T08:36:51Z) - Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models [75.85319609088354]
Sentient Agent as a Judge (SAGE) is an evaluation framework for large language models.<n>SAGE instantiates a Sentient Agent that simulates human-like emotional changes and inner thoughts during interaction.<n>SAGE provides a principled, scalable and interpretable tool for tracking progress toward genuinely empathetic and socially adept language agents.
arXiv Detail & Related papers (2025-05-01T19:06:10Z) - How Deep is Love in LLMs' Hearts? Exploring Semantic Size in Human-like Cognition [75.11808682808065]
This study investigates whether large language models (LLMs) exhibit similar tendencies in understanding semantic size.<n>Our findings reveal that multi-modal training is crucial for LLMs to achieve more human-like understanding.<n> Lastly, we examine whether LLMs are influenced by attention-grabbing headlines with larger semantic sizes in a real-world web shopping scenario.
arXiv Detail & Related papers (2025-03-01T03:35:56Z) - Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance [73.19687314438133]
We study how reliance is affected by contextual features of an interaction.
We find that contextual characteristics significantly affect human reliance behavior.
Our results show that calibration and language quality alone are insufficient in evaluating the risks of human-LM interactions.
arXiv Detail & Related papers (2024-07-10T18:00:05Z) - Modeling Human Subjectivity in LLMs Using Explicit and Implicit Human Factors in Personas [14.650234624251716]
Large language models (LLMs) are increasingly being used in human-centered social scientific tasks.
These tasks are highly subjective and dependent on human factors, such as one's environment, attitudes, beliefs, and lived experiences.
We examine the role of prompting LLMs with human-like personas and ask the models to answer as if they were a specific human.
arXiv Detail & Related papers (2024-06-20T16:24:07Z) - Are Large Language Models Aligned with People's Social Intuitions for Human-Robot Interactions? [7.308479353736709]
Large language models (LLMs) are increasingly used in robotics, especially for high-level action planning.
In this work, we test whether LLMs reproduce people's intuitions and communication in human-robot interaction scenarios.
We show that vision models fail to capture the essence of video stimuli and that LLMs tend to rate different communicative acts and behavior higher than people.
arXiv Detail & Related papers (2024-03-08T22:23:23Z) - Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench [83.41621219298489]
We evaluate Large Language Models' (LLMs) anthropomorphic capabilities using the emotion appraisal theory from psychology.
We collect a dataset containing over 400 situations that have proven effective in eliciting the eight emotions central to our study.
We conduct a human evaluation involving more than 1,200 subjects worldwide.
arXiv Detail & Related papers (2023-08-07T15:18:30Z) - Emotional Intelligence of Large Language Models [9.834823298632374]
Large Language Models (LLMs) have demonstrated remarkable abilities across numerous disciplines.
However, their alignment with human emotions and values, which is critical for real-world applications, has not been systematically evaluated.
Here, we assessed LLMs' Emotional Intelligence (EI), encompassing emotion recognition, interpretation, and understanding.
arXiv Detail & Related papers (2023-07-18T07:49:38Z) - Large Language Models Understand and Can be Enhanced by Emotional
Stimuli [53.53886609012119]
We take the first step towards exploring the ability of Large Language Models to understand emotional stimuli.
Our experiments show that LLMs have a grasp of emotional intelligence, and their performance can be improved with emotional prompts.
Our human study results demonstrate that EmotionPrompt significantly boosts the performance of generative tasks.
arXiv Detail & Related papers (2023-07-14T00:57:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.