Rethinking Emotion Annotations in the Era of Large Language Models
- URL: http://arxiv.org/abs/2412.07906v1
- Date: Tue, 10 Dec 2024 20:30:51 GMT
- Title: Rethinking Emotion Annotations in the Era of Large Language Models
- Authors: Minxue Niu, Yara El-Tawil, Amrit Romana, Emily Mower Provost,
- Abstract summary: We analyze the complexities of emotion annotation in the context of Large Language Models (LLMs)
In our experiments, GPT-4 achieves high ratings in a human evaluation study, painting a more positive picture than previous work.
To harness GPT-4's strength while preserving human perspective, we explore two ways of integrating GPT-4 into emotion annotation pipelines.
- Score: 8.701939656132973
- License:
- Abstract: Modern affective computing systems rely heavily on datasets with human-annotated emotion labels, for training and evaluation. However, human annotations are expensive to obtain, sensitive to study design, and difficult to quality control, because of the subjective nature of emotions. Meanwhile, Large Language Models (LLMs) have shown remarkable performance on many Natural Language Understanding tasks, emerging as a promising tool for text annotation. In this work, we analyze the complexities of emotion annotation in the context of LLMs, focusing on GPT-4 as a leading model. In our experiments, GPT-4 achieves high ratings in a human evaluation study, painting a more positive picture than previous work, in which human labels served as the only ground truth. On the other hand, we observe differences between human and GPT-4 emotion perception, underscoring the importance of human input in annotation studies. To harness GPT-4's strength while preserving human perspective, we explore two ways of integrating GPT-4 into emotion annotation pipelines, showing its potential to flag low-quality labels, reduce the workload of human annotators, and improve downstream model learning performance and efficiency. Together, our findings highlight opportunities for new emotion labeling practices and suggest the use of LLMs as a promising tool to aid human annotation.
Related papers
- MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis [53.012111671763776]
This study introduces MEMO-Bench, a comprehensive benchmark consisting of 7,145 portraits, each depicting one of six different emotions.
Results demonstrate that existing T2I models are more effective at generating positive emotions than negative ones.
Although MLLMs show a certain degree of effectiveness in distinguishing and recognizing human emotions, they fall short of human-level accuracy.
arXiv Detail & Related papers (2024-11-18T02:09:48Z) - From Text to Emotion: Unveiling the Emotion Annotation Capabilities of LLMs [12.199629860735195]
We compare GPT4 with supervised models and or humans in three aspects: agreement with human annotations, alignment with human perception, and impact on model training.
We find that common metrics that use aggregated human annotations as ground truth can underestimate the performance, of GPT-4.
arXiv Detail & Related papers (2024-08-30T05:50:15Z) - GPT-4 Emulates Average-Human Emotional Cognition from a Third-Person Perspective [1.642094639107215]
We first look at carefully crafted emotion-evoking stimuli, originally designed to find patterns of brain neural activity.
We show that GPT-4 is especially accurate in reasoning about such stimuli.
We find that GPT-4's interpretations align more closely with human judgments about the emotions of others than with self-assessments.
arXiv Detail & Related papers (2024-08-11T01:22:09Z) - GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing [74.68232970965595]
Multimodal large language models (MLLMs) are designed to process and integrate information from multiple sources, such as text, speech, images, and videos.
This paper assesses the application of MLLMs with 5 crucial abilities for affective computing, spanning from visual affective tasks and reasoning tasks.
arXiv Detail & Related papers (2024-03-09T13:56:25Z) - Human vs. LMMs: Exploring the Discrepancy in Emoji Interpretation and Usage in Digital Communication [68.40865217231695]
This study examines the behavior of GPT-4V in replicating human-like use of emojis.
The findings reveal a discernible discrepancy between human and GPT-4V behaviors, likely due to the subjective nature of human interpretation.
arXiv Detail & Related papers (2024-01-16T08:56:52Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion Recognition [38.2581985358104]
GPT-4 with Vision (GPT-4V) has demonstrated remarkable visual capabilities across various tasks, but its performance in emotion recognition has not been fully evaluated.
We present the quantitative evaluation results of GPT-4V on 21 benchmark datasets covering 6 tasks.
arXiv Detail & Related papers (2023-12-07T13:27:37Z) - What's Next in Affective Modeling? Large Language Models [3.0902630634005797]
GPT-4 performs well across multiple emotion tasks.
It can distinguish emotion theories and come up with emotional stories.
We suggest that LLMs could play an important role in affective modeling.
arXiv Detail & Related papers (2023-10-03T16:39:20Z) - Large Language Models Understand and Can be Enhanced by Emotional
Stimuli [53.53886609012119]
We take the first step towards exploring the ability of Large Language Models to understand emotional stimuli.
Our experiments show that LLMs have a grasp of emotional intelligence, and their performance can be improved with emotional prompts.
Our human study results demonstrate that EmotionPrompt significantly boosts the performance of generative tasks.
arXiv Detail & Related papers (2023-07-14T00:57:12Z) - Does Conceptual Representation Require Embodiment? Insights From Large
Language Models [9.390117546307042]
We compare representations of 4,442 lexical concepts between humans and ChatGPTs (GPT-3.5 and GPT-4)
We identify two main findings: 1) Both models strongly align with human representations in non-sensorimotor domains but lag in sensory and motor areas, with GPT-4 outperforming GPT-3.5; 2) GPT-4's gains are associated with its additional visual learning, which also appears to benefit related dimensions like haptics and imageability.
arXiv Detail & Related papers (2023-05-30T15:06:28Z) - Enhancing Cognitive Models of Emotions with Representation Learning [58.2386408470585]
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions.
Our framework integrates a contextualized embedding encoder with a multi-head probing model.
Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions.
arXiv Detail & Related papers (2021-04-20T16:55:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.