Evaluating Subjective Cognitive Appraisals of Emotions from Large
Language Models
- URL: http://arxiv.org/abs/2310.14389v1
- Date: Sun, 22 Oct 2023 19:12:17 GMT
- Title: Evaluating Subjective Cognitive Appraisals of Emotions from Large
Language Models
- Authors: Hongli Zhan, Desmond C. Ong, Junyi Jessy Li
- Abstract summary: This work fills the gap by presenting CovidET-Appraisals, the most comprehensive dataset to-date that assesses 24 appraisal dimensions.
CovidET-Appraisals presents an ideal testbed to evaluate the ability of large language models to automatically assess and explain cognitive appraisals.
- Score: 47.890846082224066
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The emotions we experience involve complex processes; besides physiological
aspects, research in psychology has studied cognitive appraisals where people
assess their situations subjectively, according to their own values (Scherer,
2005). Thus, the same situation can often result in different emotional
experiences. While the detection of emotion is a well-established task, there
is very limited work so far on the automatic prediction of cognitive
appraisals. This work fills the gap by presenting CovidET-Appraisals, the most
comprehensive dataset to-date that assesses 24 appraisal dimensions, each with
a natural language rationale, across 241 Reddit posts. CovidET-Appraisals
presents an ideal testbed to evaluate the ability of large language models --
excelling at a wide range of NLP tasks -- to automatically assess and explain
cognitive appraisals. We found that while the best models are performant,
open-sourced LLMs fall short at this task, presenting a new challenge in the
future development of emotionally intelligent models. We release our dataset at
https://github.com/honglizhan/CovidET-Appraisals-Public.
Related papers
- CAPE: A Chinese Dataset for Appraisal-based Emotional Generation using Large Language Models [30.40159858361768]
We introduce a two-stage automatic data generation framework to create CAPE, a Chinese dataset named Cognitive Appraisal theory-based Emotional corpus.
This corpus facilitates the generation of dialogues with contextually appropriate emotional responses by accounting for diverse personal and situational factors.
Our study shows the potential for advancing emotional expression in conversational agents, paving the way for more nuanced and meaningful human-computer interactions.
arXiv Detail & Related papers (2024-10-18T03:33:18Z) - Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants.
This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation.
We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models [53.00812898384698]
We argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking.
We highlight how cognitive biases can conflate fluent information and truthfulness, and how cognitive uncertainty affects the reliability of rating scores such as Likert.
We propose the ConSiDERS-The-Human evaluation framework consisting of 6 pillars -- Consistency, Scoring Criteria, Differentiating, User Experience, Responsible, and Scalability.
arXiv Detail & Related papers (2024-05-28T22:45:28Z) - Large Language Models are Capable of Offering Cognitive Reappraisal, if Guided [38.11184388388781]
Large language models (LLMs) have offered new opportunities for emotional support.
This work takes a first step by engaging with cognitive reappraisals.
We conduct a first-of-its-kind expert evaluation of an LLM's zero-shot ability to generate cognitive reappraisal responses.
arXiv Detail & Related papers (2024-04-01T17:56:30Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - The Confidence-Competence Gap in Large Language Models: A Cognitive
Study [3.757390057317548]
Large Language Models (LLMs) have acquired ubiquitous attention for their performances across diverse domains.
We exploit these models with diverse sets of questionnaires and real-world scenarios.
Our findings reveal intriguing instances where models demonstrate high confidence even when they answer incorrectly.
arXiv Detail & Related papers (2023-09-28T03:50:09Z) - An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language
Model Game Agents [0.40964539027092906]
This study tests the capabilities of large language models to solve emotional intelligence tasks and to simulate emotions.
It presents and evaluates a new chain-of-emotion architecture for emotion simulation within video games, based on psychological appraisal research.
arXiv Detail & Related papers (2023-09-10T16:55:49Z) - Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench [83.41621219298489]
We evaluate Large Language Models' (LLMs) anthropomorphic capabilities using the emotion appraisal theory from psychology.
We collect a dataset containing over 400 situations that have proven effective in eliciting the eight emotions central to our study.
We conduct a human evaluation involving more than 1,200 subjects worldwide.
arXiv Detail & Related papers (2023-08-07T15:18:30Z) - Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy
Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning.
ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation.
Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.