Related papers: Multi-dimensional Evaluation of Empathetic Dialog Responses

Multi-dimensional Evaluation of Empathetic Dialog Responses

URL: http://arxiv.org/abs/2402.11409v2
Date: Tue, 16 Apr 2024 16:34:02 GMT
Title: Multi-dimensional Evaluation of Empathetic Dialog Responses
Authors: Zhichao Xu, Jiepu Jiang,
Abstract summary: We propose a multi-dimensional empathy evaluation framework to measure both expressed intents from the speaker's perspective and perceived empathy from the listener's perspective. We find the two dimensions are inter-connected, and perceived empathy has a high correlation with dialogue satisfaction levels.
Score: 4.580983642743026
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Empathy is critical for effective and satisfactory conversational communication. Prior efforts to measure conversational empathy mostly focus on expressed communicative intents -- that is, the way empathy is expressed. Yet, these works ignore the fact that conversation is also a collaboration involving both speakers and listeners. In contrast, we propose a multi-dimensional empathy evaluation framework to measure both expressed intents from the speaker's perspective and perceived empathy from the listener's perspective. We apply our proposed framework to analyze our internal customer-service dialogue. We find the two dimensions (expressed intent types and perceived empathy) are inter-connected, and perceived empathy has a high correlation with dialogue satisfaction levels. To reduce the annotation cost, we explore different options to automatically measure conversational empathy: prompting LLMs and training language model-based classifiers. Our experiments show that prompting methods with even popular models like GPT-4 and Flan family models perform relatively poorly on both public and our internal datasets. In contrast, instruction-finetuned classifiers based on Flan-T5 family models outperform prior works and competitive baselines. We conduct a detailed ablation study to give more insights into instruction finetuning method's strong performance.

Related papers

Aligning Spoken Dialogue Models from User Interactions [55.192134724622235]
We propose a novel preference alignment framework to improve spoken dialogue models on realtime conversations from user interactions.<n>We create a dataset of more than 150,000 preference pairs from raw multi-turn speech conversations annotated with AI feedback.<n>Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.
arXiv Detail & Related papers (2025-06-26T16:45:20Z)
Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data [33.85748258158527]
Empathetic dialogue is crucial for natural human-computer interaction. Large language models (LLMs) have revolutionized dialogue generation by harnessing their powerful capabilities. We propose a novel approach that circumvents the need for question-answering data.
arXiv Detail & Related papers (2025-01-19T04:10:53Z)
Can LLMs Understand the Implication of Emphasized Sentences in Dialogue? [64.72966061510375]
Emphasis is a crucial component in human communication, which indicates the speaker's intention and implication beyond pure text in dialogue. This paper introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogue samples capturing the implications of emphasis. We evaluate various Large Language Models (LLMs), both open-source and commercial, to measure their performance in understanding emphasis.
arXiv Detail & Related papers (2024-06-16T20:41:44Z)
deep learning of segment-level feature representation for speech emotion recognition in conversations [9.432208348863336]
We propose a conversational speech emotion recognition method to deal with capturing attentive contextual dependency and speaker-sensitive interactions. First, we use a pretrained VGGish model to extract segment-based audio representation in individual utterances. Second, an attentive bi-directional recurrent unit (GRU) models contextual-sensitive information and explores intra- and inter-speaker dependencies jointly.
arXiv Detail & Related papers (2023-02-05T16:15:46Z)
EmpBot: A T5-based Empathetic Chatbot focusing on Sentiments [75.11753644302385]
Empathetic conversational agents should not only understand what is being discussed, but also acknowledge the implied feelings of the conversation partner. We propose a method based on a transformer pretrained language model (T5) We evaluate our model on the EmpatheticDialogues dataset using both automated metrics and human evaluation.
arXiv Detail & Related papers (2021-10-30T19:04:48Z)
Constructing Emotion Consensus and Utilizing Unpaired Data for Empathetic Dialogue Generation [22.2430593119389]
We propose a dual-generative model, Dual-Emp, to simultaneously construct the emotion consensus and utilize some external unpaired data. Our method outperforms competitive baselines in producing coherent and empathetic responses.
arXiv Detail & Related papers (2021-09-16T07:57:01Z)
Exemplars-guided Empathetic Response Generation Controlled by the Elements of Human Communication [88.52901763928045]
We propose an approach that relies on exemplars to cue the generative model on fine stylistic properties that signal empathy to the interlocutor. We empirically show that these approaches yield significant improvements in empathetic response quality in terms of both automated and human-evaluated metrics.
arXiv Detail & Related papers (2021-06-22T14:02:33Z)
DynaEval: Unifying Turn and Dialogue Level Evaluation [60.66883575106898]
We propose DynaEval, a unified automatic evaluation framework. It is capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue. Experiments show that DynaEval significantly outperforms the state-of-the-art dialogue coherence model.
arXiv Detail & Related papers (2021-06-02T12:23:18Z)
BiERU: Bidirectional Emotional Recurrent Unit for Conversational Sentiment Analysis [18.1320976106637]
The main difference between conversational sentiment analysis and single sentence sentiment analysis is the existence of context information. Existing approaches employ complicated deep learning structures to distinguish different parties in a conversation and then model the context information. We propose a fast, compact and parameter-efficient party-ignorant framework named bidirectional emotional recurrent unit for conversational sentiment analysis.
arXiv Detail & Related papers (2020-05-31T11:13:13Z)
Learning an Unreferenced Metric for Online Dialogue Evaluation [53.38078951628143]
We propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances. We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.
arXiv Detail & Related papers (2020-05-01T20:01:39Z)
You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation. Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.