Emotional Theory of Mind: Bridging Fast Visual Processing with Slow Linguistic Reasoning
- URL: http://arxiv.org/abs/2310.19995v2
- Date: Sat, 15 Jun 2024 17:47:22 GMT
- Title: Emotional Theory of Mind: Bridging Fast Visual Processing with Slow Linguistic Reasoning
- Authors: Yasaman Etesam, Özge Nilay Yalçın, Chuxuan Zhang, Angelica Lim,
- Abstract summary: We propose methods to incorporate the emotional reasoning capabilities by constructing "narrative captions" relevant to emotion perception.
We propose two distinct ways to construct these captions using zero-shot classifiers (CLIP) and fine-tuning visual-language models (LLaVA) over human generated descriptors.
Our experiments showed that combining the "Fast" narrative descriptors and "Slow" reasoning of language models is a promising way to achieve emotional theory of mind.
- Score: 0.6749750044497732
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The emotional theory of mind problem requires facial expressions, body pose, contextual information and implicit commonsense knowledge to reason about the person's emotion and its causes, making it currently one of the most difficult problems in affective computing. In this work, we propose multiple methods to incorporate the emotional reasoning capabilities by constructing "narrative captions" relevant to emotion perception, that includes contextual and physical signal descriptors that focuses on "Who", "What", "Where" and "How" questions related to the image and emotions of the individual. We propose two distinct ways to construct these captions using zero-shot classifiers (CLIP) and fine-tuning visual-language models (LLaVA) over human generated descriptors. We further utilize these captions to guide the reasoning of language (GPT-4) and vision-language models (LLaVa, GPT-Vision). We evaluate the use of the resulting models in an image-to-language-to-emotion task. Our experiments showed that combining the "Fast" narrative descriptors and "Slow" reasoning of language models is a promising way to achieve emotional theory of mind.
Related papers
- Think out Loud: Emotion Deducing Explanation in Dialogues [57.90554323226896]
We propose a new task "Emotion Deducing Explanation in Dialogues" (EDEN)
EDEN recognizes emotion and causes in an explicitly thinking way.
It can help Large Language Models (LLMs) achieve better recognition of emotions and causes.
arXiv Detail & Related papers (2024-06-07T08:58:29Z) - ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains [61.50113532215864]
Causal Emotion Entailment (CEE) aims to identify the causal utterances in a conversation that stimulate the emotions expressed in a target utterance.
Current works in CEE mainly focus on modeling semantic and emotional interactions in conversations.
We introduce a step-by-step reasoning method, Emotion-Cause Reasoning Chain (ECR-Chain), to infer the stimulus from the target emotional expressions in conversations.
arXiv Detail & Related papers (2024-05-17T15:45:08Z) - Contextual Emotion Recognition using Large Vision Language Models [0.6749750044497732]
Achieving human-level recognition of the apparent emotion of a person in real world situations remains an unsolved task in computer vision.
In this paper, we examine two major approaches enabled by recent large vision language models.
We demonstrate that a vision language model, fine-tuned even on a small dataset, can significantly outperform traditional baselines.
arXiv Detail & Related papers (2024-05-14T23:24:12Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Contextual Emotion Estimation from Image Captions [0.6749750044497732]
We explore whether Large Language Models can support the contextual emotion estimation task, by first captioning images, then using an LLM for inference.
We generate captions and emotion annotations for a subset of 331 images from the EMOTIC dataset.
We find that GPT-3.5, specifically the text-davinci-003 model, provides surprisingly reasonable emotion predictions consistent with human annotations.
arXiv Detail & Related papers (2023-09-22T18:44:34Z) - Affection: Learning Affective Explanations for Real-World Visual Data [50.28825017427716]
We introduce and share with the research community a large-scale dataset that contains emotional reactions and free-form textual explanations for 85,007 publicly available images.
We show that there is significant common ground to capture potentially plausible emotional responses with a large support in the subject population.
Our work paves the way for richer, more human-centric, and emotionally-aware image analysis systems.
arXiv Detail & Related papers (2022-10-04T22:44:17Z) - SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images.
To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features.
We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z) - Emotion Recognition under Consideration of the Emotion Component Process
Model [9.595357496779394]
We use the emotion component process model (CPM) by Scherer (2005) to explain emotion communication.
CPM states that emotions are a coordinated process of various subcomponents, in reaction to an event, namely the subjective feeling, the cognitive appraisal, the expression, a physiological bodily reaction, and a motivational action tendency.
We find that emotions on Twitter are predominantly expressed by event descriptions or subjective reports of the feeling, while in literature, authors prefer to describe what characters do, and leave the interpretation to the reader.
arXiv Detail & Related papers (2021-07-27T15:53:25Z) - DialogueCRN: Contextual Reasoning Networks for Emotion Recognition in
Conversations [0.0]
We propose novel Contextual Reasoning Networks (DialogueCRN) to fully understand the conversational context from a cognitive perspective.
Inspired by the Cognitive Theory of Emotion, we design multi-turn reasoning modules to extract and integrate emotional clues.
The reasoning module iteratively performs an intuitive retrieving process and a conscious reasoning process, which imitates human unique cognitive thinking.
arXiv Detail & Related papers (2021-06-03T16:47:38Z) - Enhancing Cognitive Models of Emotions with Representation Learning [58.2386408470585]
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions.
Our framework integrates a contextualized embedding encoder with a multi-head probing model.
Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions.
arXiv Detail & Related papers (2021-04-20T16:55:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.