E-THER: A Multimodal Dataset for Empathic AI - Towards Emotional Mismatch Awareness
- URL: http://arxiv.org/abs/2509.02100v2
- Date: Mon, 08 Sep 2025 08:37:27 GMT
- Title: E-THER: A Multimodal Dataset for Empathic AI - Towards Emotional Mismatch Awareness
- Authors: Sharjeel Tahir, Judith Johnson, Jumana Abu-Khalaf, Syed Afaq Ali Shah,
- Abstract summary: E-THER is the first Person-Centered Therapy-grounded multimodal dataset with multidimensional annotations for verbal-visual incongruence detection.<n>We show that our incongruence-trained models outperform general-purpose models in critical traits.
- Score: 3.8298581733964903
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A prevalent shortfall among current empathic AI systems is their inability to recognize when verbal expressions may not fully reflect underlying emotional states. This is because the existing datasets, used for the training of these systems, focus on surface-level emotion recognition without addressing the complex verbal-visual incongruence (mismatch) patterns useful for empathic understanding. In this paper, we present E-THER, the first Person-Centered Therapy-grounded multimodal dataset with multidimensional annotations for verbal-visual incongruence detection, enabling training of AI systems that develop genuine rather than performative empathic capabilities. The annotations included in the dataset are drawn from humanistic approach, i.e., identifying verbal-visual emotional misalignment in client-counsellor interactions - forming a framework for training and evaluating AI on empathy tasks. Additional engagement scores provide behavioral annotations for research applications. Notable gains in empathic and therapeutic conversational qualities are observed in state-of-the-art vision-language models (VLMs), such as IDEFICS and VideoLLAVA, using evaluation metrics grounded in empathic and therapeutic principles. Empirical findings indicate that our incongruence-trained models outperform general-purpose models in critical traits, such as sustaining therapeutic engagement, minimizing artificial or exaggerated linguistic patterns, and maintaining fidelity to PCT theoretical framework.
Related papers
- Memory-guided Prototypical Co-occurrence Learning for Mixed Emotion Recognition [56.00118641432005]
We propose a Memory-guided Prototypical Co-occurrence Learning framework that explicitly models emotion co-occurrence patterns.<n>Inspired by human cognitive memory systems, we introduce a memory retrieval strategy to extract semantic-level co-occurrence associations.<n>Our model learns affectively informative representations for accurate emotion distribution prediction.
arXiv Detail & Related papers (2026-02-24T04:11:25Z) - Reflecting Twice before Speaking with Empathy: Self-Reflective Alternating Inference for Empathy-Aware End-to-End Spoken Dialogue [53.95386201009769]
We introduce EmpathyEval, a descriptive natural-language-based evaluation model for assessing empathetic quality in spoken dialogues.<n>We propose ReEmpathy, an end-to-end Spoken Language Models that enhances empathetic dialogue through a novel Empathetic Self-Reflective Alternating Inference mechanism.
arXiv Detail & Related papers (2026-01-26T09:04:50Z) - Empathy Applicability Modeling for General Health Queries [16.390464387095175]
We introduce the Empathy Applicability Framework (EAF), a theory-driven approach that classifies patient queries in terms of the applicability of emotional reactions and interpretations.<n>EAF provides a framework for identifying empathy needs before response generation, establishes a benchmark for anticipatory empathy modeling.
arXiv Detail & Related papers (2026-01-14T18:47:02Z) - Do You Understand How I Feel?: Towards Verified Empathy in Therapy Chatbots [2.0452773268886126]
This paper envisions a framework integrating natural language processing and formal verification to deliver empathetic therapy chatbots.<n>A Transformer-based model extracts dialogue features, which are then translated into a Hybrid Automaton model of dyadic therapy sessions.<n>Empathy-related properties can then be verified through Statistical Model Checking.<n>Preliminary results show that the formal model captures therapy dynamics with good fidelity and that ad-hoc strategies improve the probability of satisfying empathy requirements.
arXiv Detail & Related papers (2026-01-13T12:08:58Z) - SENSE-7: Taxonomy and Dataset for Measuring User Perceptions of Empathy in Sustained Human-AI Conversations [13.232694774856931]
We propose a human-centered taxonomy that emphasizes observable empathic behaviors.<n>We introduce a new dataset, Sense-7, of real-world conversations between information workers and Large Language Models (LLMs)<n>Analysis of 695 conversations from 109 participants reveals that empathy judgments are highly individualized, context-sensitive, and vulnerable to disruption.
arXiv Detail & Related papers (2025-09-19T21:32:24Z) - Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer [78.35816158511523]
We present a single-stage emotion recognition approach, employing a Decoupled Subject-Context Transformer (DSCT) for simultaneous subject localization and emotion classification.
We evaluate our single-stage framework on two widely used context-aware emotion recognition datasets, CAER-S and EMOTIC.
arXiv Detail & Related papers (2024-04-26T07:30:32Z) - Probabilistic emotion and sentiment modelling of patient-reported
experiences [0.04096453902709291]
This study introduces a novel methodology for modelling patient emotions from online patient experience narratives.
We employ metadata network topic modelling to analyse patient-reported experiences from Care Opinion.
We develop a probabilistic, context-specific emotion recommender system capable of predicting both multilabel emotions and binary sentiments.
arXiv Detail & Related papers (2024-01-09T05:39:20Z) - Unifying the Discrete and Continuous Emotion labels for Speech Emotion
Recognition [28.881092401807894]
In paralinguistic analysis for emotion detection from speech, emotions have been identified with discrete or dimensional (continuous-valued) labels.
We propose a model to jointly predict continuous and discrete emotional attributes.
arXiv Detail & Related papers (2022-10-29T16:12:31Z) - Affect-DML: Context-Aware One-Shot Recognition of Human Affect using
Deep Metric Learning [29.262204241732565]
Existing methods assume that all emotions-of-interest are given a priori as annotated training examples.
We conceptualize one-shot recognition of emotions in context -- a new problem aimed at recognizing human affect states in finer particle level from a single support sample.
All variants of our model clearly outperform the random baseline, while leveraging the semantic scene context consistently improves the learnt representations.
arXiv Detail & Related papers (2021-11-30T10:35:20Z) - Few-shot Learning in Emotion Recognition of Spontaneous Speech Using a
Siamese Neural Network with Adaptive Sample Pair Formation [11.592365534228895]
This paper proposes a few-shot learning approach for automatically recognizing emotion in spontaneous speech from a small number of labelled samples.
Few-shot learning is implemented via a metric learning approach through a siamese neural network.
Results indicate the feasibility of the proposed metric learning in recognizing emotions from spontaneous speech in four datasets.
arXiv Detail & Related papers (2021-09-07T08:04:02Z) - Towards Unbiased Visual Emotion Recognition via Causal Intervention [63.74095927462]
We propose a novel Emotion Recognition Network (IERN) to alleviate the negative effects brought by the dataset bias.
A series of designed tests validate the effectiveness of IERN, and experiments on three emotion benchmarks demonstrate that IERN outperforms other state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-26T10:40:59Z) - Automated Quality Assessment of Cognitive Behavioral Therapy Sessions
Through Highly Contextualized Language Representations [34.670548892766625]
A BERT-based model is proposed for automatic behavioral scoring of a specific type of psychotherapy, called Cognitive Behavioral Therapy (CBT)
The model is trained in a multi-task manner in order to achieve higher interpretability.
BERT-based representations are further augmented with available therapy metadata, providing relevant non-linguistic context and leading to consistent performance improvements.
arXiv Detail & Related papers (2021-02-23T09:22:29Z) - Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy
Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning.
ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation.
Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z) - Pose-based Body Language Recognition for Emotion and Psychiatric Symptom
Interpretation [75.3147962600095]
We propose an automated framework for body language based emotion recognition starting from regular RGB videos.
In collaboration with psychologists, we extend the framework for psychiatric symptom prediction.
Because a specific application domain of the proposed framework may only supply a limited amount of data, the framework is designed to work on a small training set.
arXiv Detail & Related papers (2020-10-30T18:45:16Z) - Continuous Emotion Recognition via Deep Convolutional Autoencoder and
Support Vector Regressor [70.2226417364135]
It is crucial that the machine should be able to recognize the emotional state of the user with high accuracy.
Deep neural networks have been used with great success in recognizing emotions.
We present a new model for continuous emotion recognition based on facial expression recognition.
arXiv Detail & Related papers (2020-01-31T17:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.