Socratis: Are large multimodal models emotionally aware?
- URL: http://arxiv.org/abs/2308.16741v3
- Date: Thu, 2 Nov 2023 17:37:53 GMT
- Title: Socratis: Are large multimodal models emotionally aware?
- Authors: Katherine Deng, Arijit Ray, Reuben Tan, Saadia Gabriel, Bryan A.
Plummer, Kate Saenko
- Abstract summary: Existing emotion prediction benchmarks do not consider the diversity of emotions that an image and text can elicit in humans due to various reasons.
We propose Socratis, a societal reactions benchmark, where each image-caption (IC) pair is annotated with multiple emotions and the reasons for feeling them.
We benchmark the capability of state-of-the-art multimodal large language models to generate the reasons for feeling an emotion given an IC pair.
- Score: 63.912414283486555
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing emotion prediction benchmarks contain coarse emotion labels which do
not consider the diversity of emotions that an image and text can elicit in
humans due to various reasons. Learning diverse reactions to multimodal content
is important as intelligent machines take a central role in generating and
delivering content to society. To address this gap, we propose Socratis, a
societal reactions benchmark, where each image-caption (IC) pair is annotated
with multiple emotions and the reasons for feeling them. Socratis contains 18K
free-form reactions for 980 emotions on 2075 image-caption pairs from 5
widely-read news and image-caption (IC) datasets. We benchmark the capability
of state-of-the-art multimodal large language models to generate the reasons
for feeling an emotion given an IC pair. Based on a preliminary human study, we
observe that humans prefer human-written reasons over 2 times more often than
machine-generated ones. This shows our task is harder than standard generation
tasks because it starkly contrasts recent findings where humans cannot tell
apart machine vs human-written news articles, for instance. We further see that
current captioning metrics based on large vision-language models also fail to
correlate with human preferences. We hope that these findings and our benchmark
will inspire further research on training emotionally aware models.
Related papers
- MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis [53.012111671763776]
This study introduces MEMO-Bench, a comprehensive benchmark consisting of 7,145 portraits, each depicting one of six different emotions.
Results demonstrate that existing T2I models are more effective at generating positive emotions than negative ones.
Although MLLMs show a certain degree of effectiveness in distinguishing and recognizing human emotions, they fall short of human-level accuracy.
arXiv Detail & Related papers (2024-11-18T02:09:48Z) - Improved Emotional Alignment of AI and Humans: Human Ratings of Emotions Expressed by Stable Diffusion v1, DALL-E 2, and DALL-E 3 [10.76478480925475]
Generative AI systems are increasingly capable of expressing emotions via text and imagery.
We measure the alignment between emotions expressed by generative AI and human perceptions.
We show that the alignment significantly depends upon the AI model used and the emotion itself.
arXiv Detail & Related papers (2024-05-28T18:26:57Z) - Contextual Emotion Recognition using Large Vision Language Models [0.6749750044497732]
Achieving human-level recognition of the apparent emotion of a person in real world situations remains an unsolved task in computer vision.
In this paper, we examine two major approaches enabled by recent large vision language models.
We demonstrate that a vision language model, fine-tuned even on a small dataset, can significantly outperform traditional baselines.
arXiv Detail & Related papers (2024-05-14T23:24:12Z) - Self context-aware emotion perception on human-robot interaction [3.775456992482295]
Humans consider that contextual information and different contexts can lead to completely different emotional expressions.
We introduce self context-aware model (SCAM) that employs a two-dimensional emotion coordinate system for anchoring and re-labeling distinct emotions.
This approach has yielded significant improvements across audio, video, and multimodal environments.
arXiv Detail & Related papers (2024-01-18T10:58:27Z) - The Good, The Bad, and Why: Unveiling Emotions in Generative AI [73.94035652867618]
We show that EmotionPrompt can boost the performance of AI models while EmotionAttack can hinder it.
EmotionDecode reveals that AI models can comprehend emotional stimuli akin to the mechanism of dopamine in the human brain.
arXiv Detail & Related papers (2023-12-18T11:19:45Z) - Multi-Branch Network for Imagery Emotion Prediction [4.618814297494939]
We present a novel Multi-Branch Network (MBN) to predict both discrete and continuous emotions in an image.
Our proposed method significantly outperforms state-of-the-art methods with 28.4% in mAP and 0.93 in MAE.
arXiv Detail & Related papers (2023-12-12T18:34:56Z) - Language Models (Mostly) Do Not Consider Emotion Triggers When Predicting Emotion [87.18073195745914]
We investigate how well human-annotated emotion triggers correlate with features deemed salient in their prediction of emotions.
Using EmoTrigger, we evaluate the ability of large language models to identify emotion triggers.
Our analysis reveals that emotion triggers are largely not considered salient features for emotion prediction models, instead there is intricate interplay between various features and the task of emotion detection.
arXiv Detail & Related papers (2023-11-16T06:20:13Z) - HICEM: A High-Coverage Emotion Model for Artificial Emotional
Intelligence [9.153146173929935]
Next-generation artificial emotional intelligence (AEI) is taking center stage to address users' desire for deeper, more meaningful human-machine interaction.
Unlike theory of emotion, which has been the historical focus in psychology, emotion models are a descriptive tools.
This work has broad implications in social robotics, human-machine interaction, mental healthcare, and computational psychology.
arXiv Detail & Related papers (2022-06-15T15:21:30Z) - Modality-Transferable Emotion Embeddings for Low-Resource Multimodal
Emotion Recognition [55.44502358463217]
We propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues.
Our model achieves state-of-the-art performance on most of the emotion categories.
Our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
arXiv Detail & Related papers (2020-09-21T06:10:39Z) - ProxEmo: Gait-based Emotion Learning and Multi-view Proxemic Fusion for
Socially-Aware Robot Navigation [65.11858854040543]
We present ProxEmo, a novel end-to-end emotion prediction algorithm for robot navigation among pedestrians.
Our approach predicts the perceived emotions of a pedestrian from walking gaits, which is then used for emotion-guided navigation.
arXiv Detail & Related papers (2020-03-02T17:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.