Related papers: Artificial Intelligence Can Emulate Human Normative Judgments on Emotional Visual Scenes

Artificial Intelligence Can Emulate Human Normative Judgments on Emotional Visual Scenes

URL: http://arxiv.org/abs/2503.18796v1
Date: Mon, 24 Mar 2025 15:41:23 GMT
Title: Artificial Intelligence Can Emulate Human Normative Judgments on Emotional Visual Scenes
Authors: Zaira Romeo, Alberto Testolin,
Abstract summary: We study whether state-of-the-art multimodal systems can emulate human emotional ratings on a standardized set of images.<n>The AI judgements correlate surprisingly well with the average human ratings.
Score: 0.09208007322096533
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Affective reactions have deep biological foundations, however in humans the development of emotion concepts is also shaped by language and higher-order cognition. A recent breakthrough in AI has been the creation of multimodal language models that exhibit impressive intellectual capabilities, but their responses to affective stimuli have not been investigated. Here we study whether state-of-the-art multimodal systems can emulate human emotional ratings on a standardized set of images, in terms of affective dimensions and basic discrete emotions. The AI judgements correlate surprisingly well with the average human ratings: given that these systems were not explicitly trained to match human affective reactions, this suggests that the ability to visually judge emotional content can emerge from statistical learning over large-scale databases of images paired with linguistic descriptions. Besides showing that language can support the development of rich emotion concepts in AI, these findings have broad implications for sensitive use of multimodal AI technology.

Related papers

AI shares emotion with humans across languages and cultures [12.530921452568291]
We assess human-AI emotional alignment across linguistic-cultural groups and model-families.<n>Our analyses reveal that LLM-derived emotion spaces are structurally congruent with human perception.<n>We show that model expressions can be stably and naturally modulated across distinct emotion categories.
arXiv Detail & Related papers (2025-06-11T14:42:30Z)
MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis [53.012111671763776]
This study introduces MEMO-Bench, a comprehensive benchmark consisting of 7,145 portraits, each depicting one of six different emotions. Results demonstrate that existing T2I models are more effective at generating positive emotions than negative ones. Although MLLMs show a certain degree of effectiveness in distinguishing and recognizing human emotions, they fall short of human-level accuracy.
arXiv Detail & Related papers (2024-11-18T02:09:48Z)
EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks. But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored. EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z)
Improved Emotional Alignment of AI and Humans: Human Ratings of Emotions Expressed by Stable Diffusion v1, DALL-E 2, and DALL-E 3 [10.76478480925475]
Generative AI systems are increasingly capable of expressing emotions via text and imagery. We measure the alignment between emotions expressed by generative AI and human perceptions. We show that the alignment significantly depends upon the AI model used and the emotion itself.
arXiv Detail & Related papers (2024-05-28T18:26:57Z)
The Good, The Bad, and Why: Unveiling Emotions in Generative AI [73.94035652867618]
We show that EmotionPrompt can boost the performance of AI models while EmotionAttack can hinder it. EmotionDecode reveals that AI models can comprehend emotional stimuli akin to the mechanism of dopamine in the human brain.
arXiv Detail & Related papers (2023-12-18T11:19:45Z)
Socratis: Are large multimodal models emotionally aware? [63.912414283486555]
Existing emotion prediction benchmarks do not consider the diversity of emotions that an image and text can elicit in humans due to various reasons. We propose Socratis, a societal reactions benchmark, where each image-caption (IC) pair is annotated with multiple emotions and the reasons for feeling them. We benchmark the capability of state-of-the-art multimodal large language models to generate the reasons for feeling an emotion given an IC pair.
arXiv Detail & Related papers (2023-08-31T13:59:35Z)
Language-Specific Representation of Emotion-Concept Knowledge Causally Supports Emotion Inference [44.126681295827794]
This study used a form of artificial intelligence known as large language models (LLMs) to assess whether language-based representations of emotion causally contribute to the AI's ability to generate inferences about the emotional meaning of novel situations. Our findings provide a proof-in-concept that even a LLM can learn about emotions in the absence of sensory-motor representations and highlight the contribution of language-derived emotion-concept knowledge for emotion inference.
arXiv Detail & Related papers (2023-02-19T14:21:33Z)
HICEM: A High-Coverage Emotion Model for Artificial Emotional Intelligence [9.153146173929935]
Next-generation artificial emotional intelligence (AEI) is taking center stage to address users' desire for deeper, more meaningful human-machine interaction. Unlike theory of emotion, which has been the historical focus in psychology, emotion models are a descriptive tools. This work has broad implications in social robotics, human-machine interaction, mental healthcare, and computational psychology.
arXiv Detail & Related papers (2022-06-15T15:21:30Z)
Data-driven emotional body language generation for social robotics [58.88028813371423]
In social robotics, endowing humanoid robots with the ability to generate bodily expressions of affect can improve human-robot interaction and collaboration. We implement a deep learning data-driven framework that learns from a few hand-designed robotic bodily expressions. The evaluation study found that the anthropomorphism and animacy of the generated expressions are not perceived differently from the hand-designed ones.
arXiv Detail & Related papers (2022-05-02T09:21:39Z)
Building Human-like Communicative Intelligence: A Grounded Perspective [1.0152838128195465]
After making astounding progress in language learning, AI systems seem to approach the ceiling that does not reflect important aspects of human communicative capacities. This paper suggests that the dominant cognitively-inspired AI directions, based on nativist and symbolic paradigms, lack necessary substantiation and concreteness to guide progress in modern AI. I propose a list of concrete, implementable components for building "grounded" linguistic intelligence.
arXiv Detail & Related papers (2022-01-02T01:43:24Z)
Stimuli-Aware Visual Emotion Analysis [75.68305830514007]
We propose a stimuli-aware visual emotion analysis (VEA) method consisting of three stages, namely stimuli selection, feature extraction and emotion prediction. To the best of our knowledge, it is the first time to introduce stimuli selection process into VEA in an end-to-end network. Experiments demonstrate that the proposed method consistently outperforms the state-of-the-art approaches on four public visual emotion datasets.
arXiv Detail & Related papers (2021-09-04T08:14:52Z)
Emotion-aware Chat Machine: Automatic Emotional Response Generation for Human-like Emotional Interaction [55.47134146639492]
This article proposes a unifed end-to-end neural architecture, which is capable of simultaneously encoding the semantics and the emotions in a post. Experiments on real-world data demonstrate that the proposed method outperforms the state-of-the-art methods in terms of both content coherence and emotion appropriateness.
arXiv Detail & Related papers (2021-06-06T06:26:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.